• Internet ›
  • Cyber Crime & Security

Cyber bullying - Statistics & Facts

Social media fosters connectivity and (cyber) conflict, cyberbullying prevention and action, key insights.

Detailed statistics

U.S. student lifetime cyber bullying victimization rate 2007-2019

U.S. states with the highest cyber bullying rate 2018-2019

U.S. cyber bullying environments 2021

Editor’s Picks Current statistics on this topic

Cyber Crime

Cybercrime encounter rate in selected countries 2022

Cyber Bullying

U.S. internet users who have experienced cyber bullying 2021

Social Media & User-Generated Content

Facebook: hate speech content removal as of Q3 2023

Further recommended statistics

  • Basic Statistic Cybercrime encounter rate in selected countries 2022
  • Basic Statistic U.S. internet users who have experienced cyber bullying 2021
  • Basic Statistic U.S. student lifetime cyber bullying victimization rate 2007-2019
  • Basic Statistic U.S. student lifetime cyber bullying offending rate 2007-2019
  • Basic Statistic Cyber bullying: common types of bullying 2019
  • Basic Statistic U.S. middle and high school cyber bullying perpetration 2019
  • Basic Statistic U.S. states with the highest cyber bullying rate 2018-2019
  • Basic Statistic U.S. states with the lowest cyber bullying rate 2018-2019

Percentage of internet users in selected countries who have ever experienced any cybercrime in 2022

Share of adult internet users in the United States who have personally experienced online harassment as of January 2021

Lifetime cyber bullying victimization rate among middle and high school students in the United States from May 2007 to April 2019

U.S. student lifetime cyber bullying offending rate 2007-2019

Lifetime cyber bullying offending rate among middle and high school students in the United States from May 2007 to April 2019

Cyber bullying: common types of bullying 2019

Percentage of U.S. middle and high school students who were cyber bullied as of April 2019, by type of cyber bullying

U.S. middle and high school cyber bullying perpetration 2019

Prevalence of cyber bullying perpetration among middle and high school students as of April 2019

U.S. states with the highest rate of electronic bullying among students in the grades 9 through 12 as of 2019

U.S. states with the lowest cyber bullying rate 2018-2019

U.S. states with the lowest rate of electronic bullying among students in the grades 9 through 12 as of 2019

  • Premium Statistic U.S. cyber bullying environments 2021
  • Basic Statistic Share of U.S. teens who have experienced cyber bullying 2018
  • Basic Statistic Share of U.S. teens who have experienced cyber bullying 2018, by gender
  • Basic Statistic Teens in the U.S. who have been cyber bullied in 2018, by age and frequency
  • Basic Statistic Teenagers in the U.S. who have tried to help someone who was cyberbullied in 2018
  • Basic Statistic U.S. teen most common emotions when using social media 2018
  • Basic Statistic Negative social media effects according to U.S. teens 2018, by emotional well-being
  • Basic Statistic U.S. teens encountering hate speech on social media 2018, by type
  • Basic Statistic Teen perspectives on negative effects of social media in the U.S. 2018
  • Basic Statistic Reasons for online hate and harassment in the U.S. 2020
  • Basic Statistic Impact of online hate and harassment in the U.S. 2020
  • Basic Statistic U.S. user experiences with dating apps or websites 2019
  • Basic Statistic U.S. negative online dating behavior encounters 2019

Online environments where cyber bullying victims in the United States have been harassed as of January 2021

Share of U.S. teens who have experienced cyber bullying 2018

Percentage of teenagers in the United States who have experienced selected types of cyber bullying as of April 2018

Share of U.S. teens who have experienced cyber bullying 2018, by gender

Percentage of teenagers in the United States who have experienced selected types of cyber bullying as of April 2018, by gender

Teens in the U.S. who have been cyber bullied in 2018, by age and frequency

Percentage of teenagers in the United States who have been cyber bullied as of April 2018, by age and frequency

Teenagers in the U.S. who have tried to help someone who was cyberbullied in 2018

Percentage of teenagers in the United States who have tried to help someone who was cyberbullied as of April 2018

U.S. teen most common emotions when using social media 2018

Most common emotions of teenagers experience in the United States when using social media as of April 2018

Negative social media effects according to U.S. teens 2018, by emotional well-being

Negative social media effects according to teenagers in the United States as of April 2018, by emotional well-being

U.S. teens encountering hate speech on social media 2018, by type

Percentage of teenagers in the United States who have encountered hate speech on social media platforms as of April 2018, by type

Teen perspectives on negative effects of social media in the U.S. 2018

Leading reasons why teenagers in the United States feel that social media has a mostly negative effect on people their own age as of April 2018

Reasons for online hate and harassment in the U.S. 2020

Reasons for online hate according to online harassment victims in the United States as of January 2020

Impact of online hate and harassment in the U.S. 2020

Consequences of online hate and harassment according to internet users in the United States as of January 2020

U.S. user experiences with dating apps or websites 2019

Share of users in the United States who say their own experience of online dating sites or apps have been positive or negative as of October 2019

U.S. negative online dating behavior encounters 2019

Negative behaviors encountered on online dating sites or apps according to users in the United States as of October 2019

  • Basic Statistic Cyber bullying awareness worldwide 2018, by country
  • Basic Statistic Parent awareness of their child being cyber bullied 2018, by country
  • Basic Statistic Parent awareness of children in their community being cyber bullied 2018, by country
  • Basic Statistic Parent awareness of cyber bullying via selected platforms 2018
  • Basic Statistic Parent awareness of cyber bullying via selected platforms 2018, by region
  • Basic Statistic Global opinion on the severity of cyber bullying and counter-methods 2018
  • Basic Statistic U.S. college students who think hate speech is a serious social media problem 2019
  • Basic Statistic U.S. college students who think hate speech is a social media problem 2019, by gender

Cyber bullying awareness worldwide 2018, by country

Overall awareness of cyber bullying in select countries worldwide as of April 2018, by country

Parent awareness of their child being cyber bullied 2018, by country

Percentage of parents who report their child has been a victim of cyber bullying in 2011, 2016 and 2018, by country

Parent awareness of children in their community being cyber bullied 2018, by country

Percentage of parents who know a child in their community being victimized by cyber bullying as of April 2018, by country

Parent awareness of cyber bullying via selected platforms 2018

Knowledge of which types on online platforms are used to cyber bully children according to parents worldwide as of April 2018

Parent awareness of cyber bullying via selected platforms 2018, by region

Knowledge of which types on online platforms are used to cyber bully children according to parents worldwide as of April 2018, by region

Global opinion on the severity of cyber bullying and counter-methods 2018

Global opinion on the severity of cyber bullying and current methods to address it as of April 2018

U.S. college students who think hate speech is a serious social media problem 2019

Share of college students in the United States who think hate speech is a serious problem on social media as of December 2019

U.S. college students who think hate speech is a social media problem 2019, by gender

Share of college students in the United States who think hate speech is a serious problem on social media as of December 2019, by gender

Prevention and action

  • Basic Statistic U.S. states with state cyber bullying laws as of January 2021
  • Basic Statistic U.S. states with state sexting laws as of April 2019, by policy
  • Basic Statistic Facebook: bullying and harassment content actions as of Q4 2023
  • Basic Statistic Facebook: hate speech content removal as of Q3 2023
  • Basic Statistic Facebook: content violation appeals as of Q3 2023
  • Basic Statistic Number of videos removed from YouTube worldwide as of Q3 2023
  • Basic Statistic Share of videos removed from YouTube worldwide 2019-2023, by reason
  • Basic Statistic Distribution of video comments removed from YouTube worldwide Q3 2023, by reason

U.S. states with state cyber bullying laws as of January 2021

Number of U.S. states with state cyber bullying laws as of January 2021, by policy

U.S. states with state sexting laws as of April 2019, by policy

Number of U.S. states with state sexting laws as of April 2019, by policy

Facebook: bullying and harassment content actions as of Q4 2023

Actioned bullying and harassment content items on Facebook worldwide from 3rd quarter 2018 to 4th quarter 2023 (in millions)

Actioned hate speech content items on Facebook worldwide from 4th quarter 2017 to 3rd quarter 2023 (in millions)

Facebook: content violation appeals as of Q3 2023

User appeals submitted to Facebook regarding content removals as of 3rd quarter 2023 (in 1,000s)

Number of videos removed from YouTube worldwide as of Q3 2023

Number of videos removed from YouTube worldwide from 4th quarter 2017 to 3rd quarter 2023

Share of videos removed from YouTube worldwide 2019-2023, by reason

Distribution of videos removed from YouTube worldwide from 2nd quarter 2019 to 3rd quarter 2023, by reason

Distribution of video comments removed from YouTube worldwide Q3 2023, by reason

Distribution of removed YouTube comments worldwide as of 3rd quarter 2023, by removal reason

Further reports

Get the best reports to understand your industry.

Mon - Fri, 9am - 6pm (EST)

Mon - Fri, 9am - 5pm (SGT)

Mon - Fri, 10:00am - 6:00pm (JST)

Mon - Fri, 9:30am - 5pm (GMT)

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

  • A Majority of Teens Have Experienced Some Form of Cyberbullying

59% of U.S. teens have been bullied or harassed online, and a similar share says it’s a major problem for people their age. At the same time, teens mostly think teachers, social media companies and politicians are failing at addressing this issue.

Table of contents.

  • Acknowledgments
  • Methodology
  • Appendix A: Detailed tables

Bulk of the 50 most-recommended videos in this analysis were music videos, TV competitions, children's content or 'life hacks'

For the latest survey data on teens and cyberbullying, see “ Teens and Cyberbullying 2022 .”

Name-calling and rumor-spreading have long been an unpleasant and challenging aspect of adolescent life. But the proliferation of smartphones and the rise of social media has transformed where, when and how bullying takes place. A new Pew Research Center survey finds that 59% of U.S. teens have personally experienced at least one of six types of abusive online behaviors. 1

The most common type of harassment youth encounter online is name-calling. Some 42% of teens say they have been called offensive names online or via their cellphone. Additionally, about a third (32%) of teens say someone has spread false rumors about them on the internet, while smaller shares have had someone other than a parent constantly ask where they are, who they’re with or what they’re doing (21%) or have been the target of physical threats online (16%).

While texting and digital messaging are a central way teens build and maintain relationships, this level of connectivity may lead to potentially troubling and nonconsensual exchanges. One-quarter of teens say they have been sent explicit images they didn’t ask for, while 7% say someone has shared explicit images of them without their consent. These experiences are particularly concerning to parents. Fully 57% of parents of teens say they worry about their teen receiving or sending explicit images, including about one-quarter who say this worries them a lot, according to a separate Center survey of parents.

The vast majority of teens (90% in this case) believe online harassment is a problem that affects people their age, and 63% say this is a major problem. But majorities of young people think key groups, such as teachers, social media companies and politicians are failing at tackling this issue. By contrast, teens have a more positive assessment of the way parents are addressing cyberbullying.

These are some of the key findings from the Center’s surveys of 743 teens and 1,058 parents living in the U.S. conducted March 7 to April 10, 2018. Throughout the report, “teens” refers to those ages 13 to 17, and “parents of teens” are those who are the parent or guardian of someone in that age range.

Similar shares of boys and girls have been harassed online – but girls are more likely to be the targets of online rumor-spreading or nonconsensual explicit messages

Teen boys and girls are equally likely to be bullied online, but girls are more likely to endure false rumors, receive explicit images they didn't ask for

When it comes to the overall findings on the six experiences measured in this survey, teenage boys and girls are equally likely to experience cyberbullying. However, there are some differences in the specific types of harassment they encounter.

Overall, 60% of girls and 59% of boys have experienced at least one of six abusive online behaviors. While similar shares of boys and girls have encountered abuse, such as name-calling or physical threats online, other forms of cyberbullying are more prevalent among girls. Some 39% of girls say someone has spread false rumors about them online, compared with 26% of boys who say this.

Girls also are more likely than boys to report being the recipient of explicit images they did not ask for (29% vs. 20%). And being the target of these types of messages is an especially common experience for older girls: 35% of girls ages 15 to 17 say they have received unwanted explicit images, compared with about one-in-five boys in this age range and younger teens of both genders. 2

Online harassment does not necessarily begin and end with one specific behavior, and 40% of teens have experienced two or more of these actions. Girls are more likely than boys to have experienced several different forms of online bullying, however. Some 15% of teen girls have been the target of at least four of these online behaviors, compared with 6% of boys.

In addition to these gender differences, teens from lower-income families are more likely than those from higher-income families to encounter certain forms of online bullying. For example, 24% of teens whose household income is less than $30,000 a year say they have been the target of physical threats online, compared with 12% whose annual household income is $75,000 or more. However, teens’ experiences with these issues do not statistically differ by race or ethnicity, or by their parent’s level of educational attainment. (For details on experiences with online bullying by different demographic groups, see Appendix A .)

The likelihood of teens facing abusive behavior also varies by how often teens go online. Some 45% of teens say they are online almost constantly , and these constant users are more likely to face online harassment. Fully 67% of teens who are online almost constantly have been cyberbullied, compared with 53% of those who use the internet several times a day or less. These differences also extend to specific kinds of behaviors. For example, half of teens who are near-constant internet users say they have been called offensive names online, compared with about a third (36%) who use the internet less frequently.

A majority of teens think parents are doing a good job at addressing online harassment, but smaller shares think other groups are handling this issue effectively

Today, school officials, tech companies and lawmakers are looking for ways to combat cyberbullying. Some schools have implemented policies that punish students for harassing messages even when those exchanges occur off campus. Anti-bullying tools are being rolled out by social media companies, and several states have enacted laws prohibiting cyberbullying and other forms of electronic harassment. In light of these efforts, Pew Research Center asked young people to rate how key groups are responding to cyberbullying and found that teens generally are critical of the way this problem is being addressed.

A majority of teens think parents are doing a good job in addressing online harassment, but are critical of teachers, social media companies and politicians

Indeed, teens rate the anti-bullying efforts of five of the six groups measured in the survey more negatively than positively. Parents are the only group for which a majority of teens (59%) express a favorable view of their efforts.

Young people have an especially negative view of the way politicians are tackling the issue of cyberbullying – 79% of teens say elected officials are doing only a fair or poor job of addressing this problem. And smaller majorities have unfavorable views of how groups such as social media sites (66%), other users who witness harassment happening online (64%) or teachers (58%) are addressing harassment and cyberbullying.

Teens’ views on how well each of these groups is handling this issue vary little by their own personal experiences with cyberbullying – that is, bullied teens are no more critical than their non-bullied peers. And teens across various demographic groups tend to have a similar assessment of how these groups are addressing online harassment.

About six-in-ten parents worry about their own teen getting bullied online, but most are confident they can teach their teen about acceptable online behavior

Parents believe they can provide their teen with the appropriate advice to make good online decisions. Nine-in-ten parents say they are at least somewhat confident they can teach their teen how to engage in appropriate online behavior, including 45% who say they are very confident in their ability to do so.

About six-in-ten parents worry about their teen getting bullied online, exchanging explicit images, but this varies by race, ethnicity and the child's gender

But even as most parents are confident they can educate their child about proper online conduct, notable shares are concerned about the types of negative experiences their teen might encounter online. Roughly six-in-ten parents say they worry at least somewhat about their teen being harassed or bullied online (59%) or sending or receiving explicit images (57%). In each case, about one-in-four parents say they worry a lot about one of these things happening to their child.

These parental concerns tend to vary by race and ethnicity, as well as by a child’s gender. Among parents, whites and Hispanics are more likely than blacks to say they worry about their teen being cyberbullied. Hispanic parents also are more inclined than black parents to say they worry about their child exchanging explicit images. At the same time, parents of teen girls are somewhat more likely than those with a teenage boy to say they worry about their teen being bullied online (64% vs. 54%) or exchanging explicit images (64% vs. 51%). (For details on these parental concerns by demographic group, see Appendix A .)

  • Pew Research Center measured cyberbullying by asking respondents if they had ever experienced any of six online behaviors. Respondents who selected yes to one or more of these questions are considered to be targets of cyberbullying in this study. Throughout the report the terms “cyberbullying” and “online harassment” are used interchangeably. ↩
  • A 2017 Pew Research Center survey of U.S. adults also found age and gender differences in receiving nonconsensual explicit images; women ages 18 to 29 are especially likely to encounter this behavior. ↩

Sign up for our weekly newsletter

Fresh data delivery Saturday mornings

Sign up for The Briefing

Weekly updates on the world of news & information

  • Social Media
  • Teens & Tech
  • Teens & Youth

WhatsApp and Facebook dominate the social media landscape in middle-income nations

Germans stand out for their comparatively light use of social media, majorities in most countries surveyed say social media is good for democracy, social media fact sheet, americans’ social media use, most popular, report materials.

  • Teens and Tech Survey 2018

901 E St. NW, Suite 300 Washington, DC 20004 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

© 2024 Pew Research Center

American Psychological Association Logo

Cyberbullying: What is it and how can you stop it?

Explore the latest psychological science about the impact of cyberbullying and what to do if you or your child is a victim

  • Mental Health
  • Social Media and Internet

Tween girl staring at a smartphone

Cyberbullying can happen anywhere with an internet connection. While traditional, in-person bullying is still more common , data from the Cyberbullying Research Center suggest about 1 in every 4 teens has experienced cyberbullying, and about 1 in 6 has been a perpetrator. About 1 in 5 tweens, or kids ages 9 to 12, has been involved in cyberbullying (PDF, 5.57MB) .

As technology advances, so do opportunities to connect with people—but unfettered access to others isn’t always a good thing, especially for youth. Research has long linked more screen time with lower psychological well-being , including higher rates of anxiety and depression. The risk of harm is higher when kids and teens are victimized by cyberbullying.

Here’s what you need to know about cyberbullying, and psychology’s role in stopping it.

What is cyberbullying?

Cyberbullying occurs when someone uses technology to demean, inflict harm, or cause pain to another person. It is “willful and repeated harm inflicted through the use of computers, cell phones, and other electronic devices.” Perpetrators bully victims in any online setting, including social media, video or computer games, discussion boards, or text messaging on mobile devices.

Virtual bullying can affect anyone, regardless of age. However, the term “cyberbullying” usually refers to online bullying among children and teenagers. It may involve name calling, threats, sharing private or embarrassing photos, or excluding others.

One bully can harass another person online or several bullies can gang up on an individual. While a stranger can incite cyberbullying, it more frequently occurs among kids or teens who know each other from school or other social settings. Research suggests bullying often happens both at school and online .

Online harassment between adults can involve different terms, depending on the relationship and context. For example, dating violence, sexual harassment, workplace harassment, and scamming—more common among adults—can all happen on the internet.

How can cyberbullying impact the mental health of myself or my child?

Any form of bullying can negatively affect the victim’s well-being, both at the time the bullying occurs and in the future. Psychological research suggests being victimized by a cyberbully increases stress and may result in anxiety and depression symptoms . Some studies find anxiety and depression increase the likelihood adolescents will become victims to cyberbullying .

Cyberbullying can also cause educational harm , affecting a student’s attendance or academic performance, especially when bullying occurs both online and in school or when a student has to face their online bully in the classroom. Kids and teens may rely on negative coping mechanisms, such as substance use, to deal with the stress of cyberbullying. In extreme cases, kids and teens may struggle with self-harm or suicidal ideation .

How can parents talk to their children about cyberbullying?

Parents play a crucial role in preventing cyberbullying and associated harms. Be aware of what your kids are doing online, whether you check your child’s device, talk to them about their online behaviors, or install a monitoring program. Set rules about who your child can friend or interact with on social media platforms. For example, tell your child if they wouldn’t invite someone to your house, then they shouldn’t give them access to their social media accounts. Parents should also familiarize themselves with signs of cyberbullying , such as increased device use, anger or anxiety after using a device, or hiding devices when others are nearby.

Communicating regularly about cyberbullying is an important component in preventing it from affecting your child’s well-being. Psychologists recommend talking to kids about how to be safe online before they have personal access to the internet. Familiarize your child with the concept of cyberbullying as soon as they can understand it. Develop a game plan to problem solve if it occurs. Cultivating open dialogue about cyberbullying can ensure kids can identify the experience and tell an adult, before it escalates into a more harmful situation.

It’s also important to teach kids what to do if someone else is being victimized. For example, encourage your child to tell a teacher or parent if someone they know is experiencing cyberbullying.

Keep in mind kids may be hesitant to open up about cyberbullying because they’re afraid they’ll lose access to their devices. Encourage your child to be open with you by reminding them they won’t get in trouble for talking to you about cyberbullying. Clearly explain your goal is to allow them to communicate with their friends safely online.

How can I report cyberbullying?

How you handle cyberbullying depends on a few factors, such as the type of bullying and your child’s age. You may choose to intervene by helping a younger child problem solve whereas teens may prefer to handle the bullying on their own with a caregiver’s support.

In general, it’s a good practice to take screenshots of the cyberbullying incidents as a record, but not to respond to bullies’ messages. Consider blocking cyberbullies to prevent future harassment.

Parents should contact the app or website directly about removing bullying-related posts, especially if they reveal private or embarrassing information. Some social media sites suspend perpetrators’ accounts.

If the bullying also occurs at school or on a school-owned device, or if the bullying is affecting a child’s school performance, it may be appropriate to speak with your child’s teacher or school personnel.

What are the legal ramifications of cyberbullying?

In some cases, parents should report cyberbullying to law enforcement. If cyberbullying includes threats to someone’s physical safety, consider contacting your local police department.

What’s illegal can vary from state to state. Any illegal behaviors, such as blackmailing someone to send money, hate crimes, stalking, or posting sexual photos of a minor, can have legal repercussions. If you’re not sure about what’s legal and what’s not, check your state’s laws and law enforcement .

Are big tech companies responsible for promoting positive digital spaces?

In an ideal world, tech companies would prioritize creating safer online environments for young people. Some companies are working toward it already, including partnering with psychologists to better understand how their products affect kids, and how to keep them safe. But going the extra mile isn’t always profitable for technology companies. For now, it’s up to individuals, families, and communities to protect kids’ and teens’ best interest online.

What does the research show about psychology’s role in reducing this issue?

Many studies show preventative measures can drastically reduce cyberbullying perpetration and victimization . Parents and caregivers, schools, and technology companies play a role in educating kids about media literacy and mental health. Psychologists—thanks to their expertise in child and teen development, communication, relationships, and mental health—can also make important contributions in preventing cyberbullying.

Because cybervictimization coincides with anxiety and depression, research suggests mental health clinicians and educators should consider interventions that both address adolescents’ online experiences and support their mental, social, and emotional well-being. Psychologists can also help parents speak to their kids about cyberbullying, along with supporting families affected by it.

You can learn more about cyberbullying at these websites:

  • Cyberbullying Research Center
  • StopBullying.gov
  • Nemours Kids Health

Acknowledgments

APA gratefully acknowledges the following contributors to this publication:

  • Sarah Domoff, PhD, associate professor of psychology at Central Michigan University
  • Dorothy Espelage, PhD, William C. Friday Distinguished Professor of Education at the University of North Carolina
  • Stephanie Fredrick, PhD, NCSP, assistant professor and associate director of the Dr. Jean M. Alberti Center for the Prevention of Bullying Abuse and School Violence at the University at Buffalo, State University of New York
  • Brian TaeHyuk Keum, PhD, assistant professor in the Department of Social Welfare at the UCLA Luskin School of Public Affairs
  • Mitchell J. Prinstein, PhD, chief science officer at APA
  • Susan Swearer, PhD, Willa Cather Professor of School Psychology, University of Nebraska-Lincoln; licensed psychologist

Recommended Reading

Big Brave Bold Sergio

You may also like

Security.org YouTube Channel

Cyberbullying: Twenty Crucial Statistics for 2024

Written By: Security.org Team | Updated: June 6, 2024

data presentation about cyberbullying

  • Cyberbullying occurs on every social media platform, but mostly on YouTube, Snapchat, TikTok, and Facebook.
  • Kids of all ages become victims of cyberbullying, but the risk increases as they grow older.
  • Cyberbullying is more prevalent than most people think. Read our article on 5 shocking cyberbullying facts every parent should know to learn more.

Bullying takes on many forms, but today, a new kind of bullying has emerged – cyberbullying. It’s slightly different from what we were used to in our youth, but the consequences are just as grim, if not more.

Cyberbullying happens online and digitally. There’s no shoving or physical harm involved, so it’s not as easy to spot as physical bullying. What’s worse, it can happen even if your kid doesn’t leave the house. A hateful message or over-the-line teasing sent to their inbox, a demeaning video of them going viral, or degrading rumors spreading online are all types of cyberbullying.

As parents, it’s scary to think that our kids might get bullied right under our noses. That’s why we have to arm ourselves with knowledge about cyberbullying to better protect them. Let’s start with 20 critical cyberbullying facts you need to know.

Pro Tip: Limiting your child’s social media use, using tools like Norton, can help prevent cyberbullying. Learn more about Norton’s parental control features in our Norton review . It can protect your family against more than just malware.

Table of Contents

  • What is Cyberbullying
  • Increased Screen Time

Prevalence of Cyberbullying

Higher risk, cyberbullying impacts, taking action, what is cyberbullying.

Cyberbullying is bullying that happens through digital devices such as phones or computers. It often happens over social media, text, email, instant messages, and gaming. Cyberbullying often takes the form of sending or sharing harmful or mean content about someone to embarrass them. Sometimes this content is shared anonymously, making cyberbullying feel even more threatening.

Given the broad definition of cyberbullying, numbers and statistics around it can sometimes vary wildly. There are also different interpretations of what it really is and most studies rely on victims self-reporting instances of bullying committed against them. We were all children once, and we know that a lot of kids don’t resort to telling on their bullies in fear of further harm. All those factors create discrepancies in cyberbullying statistics. The bottom line though is that cyberbullying is quickly becoming a major problem in our society.

Here are some statistics to prove that:

  • According to our cyberbullying research , in which we studied parents of kids between the ages of 10 and 18, 21 percent of parents claimed that their children have been cyberbullied.
  • 56 percent of these reports occurred from January to July 2020. We believe this increase correlates with the increased time spent online during COVID-19 lockdowns.
  • Cyberbullying affects more than just kids. In a 2020 study, it was found that 44 percent of all internet users in the U.S. have experienced harassment online, which can be considered a type of cyberbullying. The most common type of online harassment was name-calling, making up 37 percent of all harassment
  • Of all the social networks, kids on YouTube are the most likely to be cyberbullied at 79 percent, followed by Snapchat at 69 percent, TikTok at 64 percent, and Facebook at 49 percent.
  • We also found that, as a child’s age increased, so did the likelihood of cyberbullying. As the child aged in two-year intervals between the ages of 10 and 18, their likelihood of being cyberbullied increased by 2 percent.
  • Children from households with annual incomes of under $75,000 were twice as likely to be cyberbullied than kids from houses with annual incomes of over $75,000 (22 versus 11 percent).
  • Cyberbullying can bring up various emotions from the victim, but the most common response is to feel angry. Over half of teens who have experienced cyberbullying felt resentment towards their bully, while about a third felt hurt. 15 percent of them felt
  • Cyberbullying also affects how a victim feels about themselves. Two-thirds of cyberbullying victims said that getting bullied online had a negative impact on how they felt about themselves, bringing up feelings of insecurity and low self-worth.
  • Lastly, studies show that cyberbullying can have lasting mental, physical, and social impacts. Nearly a third of cyberbullying victims said the incidents affected their friendships, whereas 13 percent said it affected their physical
  • The most effective way to prevent cyberbullying , teens say, is to block the bully, according to the National Crime Prevention Council.
  • 36 percent asked the bully to stop cyberbullying them.
  • 34 percent blocked all communication with the bully.
  • 29 percent did nothing.
  • 11 percent talked to their parents about the incidents.
  • Almost two-thirds of tweens said that they tried to help someone who was being bullied online, and 30 percent had tried to help multiple times, according to the Cyberbullying Research Center.

As teens and young adults spend more of their time online, cyberbullying has become a major issue. The fact that perpetrators hide behind screens does not make the effects of cyberbullying any less damaging to those involved. Teens themselves agree that cyberbullying is a major problem but do not feel like those in charge are doing enough to address it. Anti-bullying organizations and campaigns aim to educate and empower people to prevent and handle cyberbullying, but the overall feeling from today’s youth is that social media companies and our elected officials should do more to prevent cyberbullying and protect kids online. For more information on how to prevent and handle cyberbullying, check out our cyberbullying resources.

  • Statista. (2020). Increased time spent on media consumption due to the coronavirus outbreak among internet users worldwide as of March 2020, by country . statista.com/statistics/1106766/media-consumption-growth-coronavirus-worldwide-by-country/
  • Pew Research Center. (2020). Parenting Children in the Age of Screens . pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/
  • Security.org. (2022). The Best VPN of 2022 . security.org/vpn/best/
  • Morning Consult. (2020). YouTube, Netflix and Gaming: A Look at What Kids Are Doing With Their Increased Screen Time . morningconsult.com/2020/08/20/youtube-netflix-and-gaming-a-look-at-what-kids-are-doing-with-their-increased-screen-time/
  • Statista. (2021). U.S. internet users who have experienced cyber bullying 2020 . statista.com/statistics/333942/us-internet-online-harassment-severity/
  • Security.org. (2022). The Best Identity Theft Protection Services of 2022 . security.org/identity-theft/best/
  • National Crime Prevention Council. (2021). Stop Cyberbullying Before it Starts . archive.ncpc.org/resources/files/pdf/bullying/cyberbullying.pdf
  • Cyberbullying Research Center. (2021). Tween Cyberbullying in 2020 . i.cartoonnetwork.com/stop-bullying/pdfs/CN_Stop_Bullying_Cyber_Bullying_Report_9.30.20.pdf

Need additional insights for a story?

Send our research team an email

Message To Our Research Team

  • Open access
  • Published: 14 January 2023

Prevalence and related risks of cyberbullying and its effects on adolescent

  • Gassem Gohal 1 ,
  • Ahmad Alqassim 2 ,
  • Ebtihal Eltyeb 1 ,
  • Ahmed Rayyani 3 ,
  • Bassam Hakami 3 ,
  • Abdullah Al Faqih 3 ,
  • Abdullah Hakami 3 ,
  • Almuhannad Qadri 3 &
  • Mohamed Mahfouz 2  

BMC Psychiatry volume  23 , Article number:  39 ( 2023 ) Cite this article

60k Accesses

19 Citations

25 Altmetric

Metrics details

Cyberbullying is becoming common in inflicting harm on others, especially among adolescents. This study aims to assess the prevalence of cyberbullying, determine the risk factors, and assess the association between cyberbullying and the psychological status of adolescents facing this problem in the Jazan region, Saudi Arabia.

A cross-sectional study was conducted on 355 students, aged between 12–18 years, through a validated online questionnaire to investigate the prevalence and risk factors of cyberbullying and assess psychological effects based on cyberbullying questionnaire and Mental Health Inventory-5 (MHI-5) questions.

The participants in this study numbered 355; 68% of participants were females compared to 32% were males. Approximately 20% of the participants spend more than 12 h daily on the Internet, and the estimated overall prevalence of cyberbullying was 42.8%, with the male prevalence slightly higher than females. In addition, 26.3% of the participants were significantly affected in their academic Performance due to cyberbullying. Approximately 20% of all participants considered leaving their schools, 19.7% considered ceasing their Internet use, and 21.1% considered harming themselves due to the consequences of cyberbullying. There are essential links between the frequency of harassment, the effect on academic Performance, and being a cyber victim.

Conclusions

Cyberbullying showed a high prevalence among adolescents in the Jazan region with significant associated psychological effects. There is an urgency for collaboration between the authorities and the community to protect adolescents from this harmful occurrence.

Peer Review reports

Introduction

Cyberbullying is an intentional, repeated act of harm toward others through electronic tools; however, there is no consensus to define it [ 1 , 2 , 3 ]. With the surge in information and data sharing in the emerging digital world, a new era of socialization through digital tools, and the popularization of social media, cyberbullying has become more frequent than ever and occurs when there is inadequate adult supervision [ 4 , 5 ]. A large study that looked at the incidence of cyberbullying among adolescents in England found a prevalence of 17.9%, while one study conducted in Saudi Arabia found a prevalence of 20.97% [ 6 , 7 ]. Cyberbullying can take many forms, including sending angry, rude, or offensive messages; intimidating, cruel, and possibly false information about a person to others; sharing sensitive or private information (outing); and exclusion, which involves purposefully leaving someone out of an online group [ 8 ]. Cyberbullying is influenced by age, sex, parent–child relationships, and time spent on the Internet [ 9 , 10 ]. Although some studies have found that cyberbullying continues to increase in late adolescence, others found that cyberbullying tends to peak at 14 and 15 years old before decreasing through the remaining years of adolescence [ 11 , 12 , 13 ].

The COVID-19 epidemic has impacted the prevalence of cyberbullying since social isolation regulations have reduced face-to-face interaction, leading to a significant rise in the use of social networking sites and online activity. As a result, there was a higher chance of experiencing cyberbullying [ 14 ].

Unlike traditional Bullying, which usually only occurs in school and is mitigated at home, victims of cyberbullying can be contacted anytime and anywhere. Parents and teachers are seen as saviors in cases of traditional Bullying. Simultaneously, in cyberbullying, children tend to be reluctant to tell adults for fear of losing access to their phones and computers, so they usually hide the cyberbullying incident [ 15 ]. Reports show that cyberbullying is a form of harm not easily avoided by the victim. In addition, in the cyber form of Bullying, identification of the victim and the perpetrator is generally challenging compared to traditional Bullying; this makes an accurate estimation of the problem widely contested [ 16 , 17 ].

There is growing evidence that is cyberbullying causes more significant levels of depression, anxiety, and loneliness than traditional forms of Bullying. A meta-analysis examining the association between peer victimization, cyberbullying, and suicide in children and adolescents indicates that cyberbullying is more intensely related to suicidal ideation than traditional Bullying [ 18 ]. Moreover, the significant problem is that cyberbullying impacts adolescent due to its persistence and recurrence. A recent report in Saudi Arabia indicated a growing rise in cyberbullying in secondary schools and higher education, from 18% to approximately 27% [ 19 ]. In primary schools and kindergartens in Saudi Arabia, we were not surprised to find evidence that children were unaware that cyberbullying is illegal. Although the study showed an adequate awareness of the problem in our country, Saudi Arabia, there were relatively significant misconceptions [ 20 ].

Adolescents' emotional responses to cyberbullying vary in severity and quality. However, anger, sadness, concern, anxiety, fear, and depression are most common among adolescent cyber victims [ 21 ]. Moreover, cyberbullying may limit students' academic Performance and cause higher absenteeism rates [ 22 ]. Consequently, this study aims to assess the prevalence of cyberbullying, determine the risk factors, and establish the association between cyberbullying and the psychological status of adolescents. We believe our study will be an extension of and significantly add to the literature regarding the nature and extent of cyberbullying in the Jazan region of Saudi Arabia.

A descriptive cross-sectional study was carried out in the Jazan region, a province of the Kingdom of Saudi Arabia. It is located on the tropical Red Sea coast of southwestern Saudi Arabia.

Design and participants

A descriptive cross-sectional study was carried out in the Jazan region, a province of the Kingdom of Saudi Arabia. It is located on the tropical Red Sea coast of southwestern Saudi Arabia. The study targeted adolescents (12–18 years old) who use the Internet to communicate in the Jazan region. The main inclusion criteria are adolescents between 12–18 years who use the Internet and agree to participate; however, it excludes adolescents not matching the inclusion criteria or those refusing to participate in the study. If participants were under 16, the parent and/or legal guardian should be notified. A sample of participants was estimated for this study, and the ideal sample size was calculated to be 385 using the Cochran formula, n  = (z) 2 p (1 – p) / d 2 . Where: p = prevalence of cyberbullying 50%, z = a 95% confidence interval, d = error of not more than 5%. A convenience sample was used to recruit the study participants. A self-administrated online questionnaire was used to collect the study information from May to December 2021.

The ethical approval for this study was obtained from The Institute Review Board (IRB) of Jazan University (Letter v.1 2019 dated 08/04/2021). Informed consent was acquired from all participants and was attached to the beginning of the form and mandatory to be read and checked before the participant proceeded to the first part of the questionnaire. For the participants under 16, informed consent was obtained from a parent and legal guardian.

Procedure of data collection and study measures

An Arabic self-administrated online questionnaire was used for this research. This anonymous online survey instrument was based on (Google Forms). The study team distributed the questionnaire to the participants through school teachers. The research team prepared the study questionnaire and chose the relevant cyberbullying scale questions from similar studies [ 5 , 6 ]. The questionnaire was translated by two bilingual professionals to ensure the accuracy and appropriateness of the instrument wording. A panel of experts then discussed and assessed the validity and suitability of the instrument for use on adolescents. The panel also added and edited a few questions to accommodate the local culture of Saudi students. It was validated with a pilot study that included 20 participants. The questionnaire was divided into three main sections. The first part of the questionnaire contains the basic participant information, including gender, age, nationality, school grade, residence, and information about family members and the mother's occupation and education. The mother's level of education was considered as it found that mothers' low levels of education specifically had a detrimental impact on the cyberbullying process [ 23 ]. The second section explores the participant's definition of cyberbullying, questions regarding exposure to cyberbullying as a victim or by bullying another person, and questions considering the possible risk factors behind cyberbullying. The last section explores how cyberbullying affects adolescents psychologically based on the standardized questionnaire Mental Health Inventory-5 (MHI-5). MHI-5 is a well-known, valid, reliable, and brief international instrument for assessing mental health in children and adolescents (such as satisfaction, interest in, and enjoyment of life) and negative aspects (such as anxiety and depression) [ 24 ]. It is composed of five questions, as shown in Table 1 . There are six options available for each question, ranging from "all the time" (1 point) to "none of the time" (6 points); therefore, the adolescent's score varies between five and 30. These questions assess both negative and positive qualities of mental health, as well as questions about anxiety and depression. By adding all the item scores and converting this score to a scale ranging from 0 to 100, the final MHI-5 score is determined, with lower scores indicating more severe depressive symptoms. The value for which the sum of sensitivity and specificity was utilized to establish the ideal cut-off score for MHI-5 in many similar studies was reviewed to reach an optimal conclusion. Therefore, we considered all cut-off values with associated sensitivities and specificities of various MHI-5 cut-off points previously employed among adolescents in similar studies and compared them to conclude that MHI-5 = 70 as our cut points. So the presence of depressive symptoms is considered with an MHI-5 cut-off score of ≤ 70 [ 25 ].

The Questionnaires were initially prepared in English and then translated into Arabic. A native speaker with fluency in English (with experience in translation) converted the questionnaire from the initial English version into Arabic. Then, we performed a pilot study among 20 participants to ensure the readability and understandability of the questionnaire questions. We also assessed the internal consistency of the questionnaire based on Cronbach’s alpha, which produced an acceptable value of 0.672. The internal consistency for Mental Health Inventory-5 (MHI-5) was reported at 0.557. In order to assess the factor structure of the Arabic-translated version of the (MHI-5) questionnaire, a factor analysis was conducted. The factor loading of the instrument is shown in Table 1 . Using principal component analysis and the varimax rotation method, we found a one-component solution explaining 56.766% of the total variance. All items loaded on the first factor ranged from (0.688 to 0.824), which confirms that a single factor has explained all the items of the scale. In addition, Bartlett’s test of sphericity was found significant ( p  < 0.001).

Data presentation & statistical analysis

Simple tabulation frequencies were used to give a general overview of the data. The prevalence of cyberbullying was presented using 95% C.I.s, and the Chi-squared test was performed to determine the associations between individual categorical variables and Mental Health. The univariate and multivariate logistic regression model was derived, and unadjusted and adjusted odds ratios (OR) and their 95% confidence intervals (C.I.s) were calculated. A P -value of 0.05 or less was used as the cut-off level for statistical significance. The statistical analysis was completed using SPSS ver. 25.0 (SPSS Inc. Chicago, IL, USA) software.

The distributed survey targeted approximately 385 students, but the precise number of respondents to the questionnaire was 355 (92% response rate), with 68% of female students responding, compared to 32% of male students. More than half of the respondents were secondary school students, with a nearly equal mix of respondents living in cities and rural areas. Table 2 demonstrates that 20% of the participants spend more than 12 h daily on the Internet and electronic gadgets, while only 13% spend less than two hours.

As demonstrated in Table 3 , the total prevalence of cyberbullying was estimated to be 42.8%, with male prevalence somewhat higher than female prevalence. Additional variables, such as the number of hours spent on the Internet, did not affect the prevalence. Table 4 shows the pattern and experience of being cyberbullied across mental health levels, as measured by the MHI-5.

Academic Performance was significantly affected due to cyberbullying in 26.3% of the participants. Furthermore, approximately 20% of all participants considered leaving their schools for this reason. Moreover, 19.7% of the participants thought of stopping using the Internet and electronic devices, while 21.1% considered harming themselves due to the effects of cyberbullying. Regarding associations between various variables and psychological effects using the MHI-5, there are significant associations between whether the participant has been a cyber victim before (cOR 2.8), the frequency of harassment (cOR 1.9), academic Performance (cOR 6.5), and considering leaving school as a result of being a cyber victim (cOR 3.0). In addition, by using univariate logistic regression analysis, there are significant associations between the psychological effects and the participant's thoughts of getting rid of a bully (cOR 2.8), thinking to stop using electronic devices (cOR 3.0), and considering hurting themselves as the result of cyberbullying (cOR 6.4). In addition, the use of the multivariate logistic regression analysis showed that frequency of harassment was the only statistically significant predictor of mental health among adolescents (aOR 2.8). Other variables continue to have higher (aORs) but without statistical significance. All these results are demonstrated in Table 4 .

Cyberbullying prevalence rates among adolescents vary widely worldwide, ranging from 10% to more than 70% in many studies. This variation results from certain factors, specifically gender involvement, as a decisive influencing factor [ 26 , 27 ]. Our study found a prevalence of 42.8% (95% confidence interval (CI): 37.7–48), which is higher than the median reported prevalence of cyberbullying of 23.0% in a scoping review that included 36 studies conducted in the United States in adolescents aged 12 to 18 years old [ 28 ]. A systematic review found that cyberbullying ranged from 6.5% to 35.4% [ 3 ]. These two studies gathered data before the COVID-19 pandemic. When compared to recent studies, it was found that cyberbullying increased dramatically during the COVID-19 era [ 29 , 30 ]. Subsequently, with the massive mandate of world online communication in teaching and learning, young adolescents faced a large amount of cyberspace exposure with all risk-related inquiries. Psychological distress due to COVID-19 and spending far more time on the Internet are vital factors in this problem, which might be a reasonable explanation for our results.

There is insufficient data to compare our findings to the Arab world context, notably Saudi Arabia. Although, according to one study done among Saudi Arabian university students, the prevalence was 17.6%. [ 31 ]. we discovered a considerable discrepancy between this prevalence and our findings, and the decisive explanation is the difference in the target age group studied. Age is a crucial risk factor for cyberbullying, and according to one study, cyberbullying peaks at around 14 and 15 years of age and then declines in late adolescence. Thus, a U-inverted relation exists between prevalence and age [ 11 , 12 , 13 , 32 ].

In our study, males reported being more vulnerable to cyberbullying despite there being more female participants; this inconsistent finding with previous literature requires further investigation. A strong, but not recent, meta-analysis in 2014 reported that, in general, males are likely to cyberbully more than females. Females were more likely to report cyberbullying during early to mid-adolescence than males [ 11 ]. This finding presents a concern for males reporting lower than females’ results in our data and raises some questions about whether cultural or religious conservative values play a role.

Increased Internet hours are another risk factor in this study and were significantly associated with cyberbullying. Specifically, it was likely to be with heavy Internet users (> 12 h/day); a similar result was well documented in one equivalent study [ 3 ]. Notably, while some studies have reported that those living in city areas are more likely to be cyberbullying victims than their counterparts from suburban areas [ 3 ], our observations reported no significant influence of this factor on the prevalence of cyberbullying.

According to a population-based study on cyberbullying and teenage well-being in England, which included 110,000 pupils, traditional Bullying accounted for more significant variability in mental well-being than cyberbullying. It did, however, conclude that both types of Bullying carry a risk of affecting mental health [ 33 ]. We confirmed in this study that multiple occurrences of cyberbullying and the potential for being a victim are risk factors influencing mental health ( P  < 0.001). Moreover, the frequency of harassment also shows a significant, influential effect. The victim's desire to be free from the perpetrator carrying out the cyberbullying is probably an alarming sign and a precursor factor for suicidal ideation; we reported that nearly half of the participants wished they could get rid of the perpetrators. Furthermore, more than 20% of participants considered harming themselves due to cyberbullying; this result is consistent with many studies that linked cyberbullying and self-harm and suicidal thoughts [ 34 , 35 , 36 ].

Adolescence is a particularly vulnerable age for the effects of cyberbullying on mental health. In one Saudi Arabian study, parents felt that cyberbullying is more detrimental than Bullying in the schoolyard and more harmful to their children's mental health. According to them, video games were the most popular social platform for cyberbullying [ 37 ]. Both cross-sectional and longitudinal research shows a significant link between cyberbullying and emotional symptoms, including anxiety and depression [ 38 , 39 ]. Therefore, we employed the MHI-5 to measure the mental impact of cyberbullying on adolescents in this study. Overall, the MHI-5 questionnaire showed relatively high sensitivity in detecting anxiety and depression disorders for general health and quality of life assessments. The questions listed happy times, peacefulness, and sensations of calmness, in addition to episodes of anxiousness, downheartedness, and feelings of depression, as given in Table 1 .

Cyberbullying has been well-documented to affect the academic achievement of the victim adolescents. Therefore, bullied adolescents are likelier to miss school, have higher absence rates, dislike school, and report receiving lower grades. According to one meta-analysis, peer victimization has a significant negative link with academic achievement, as measured by grades, student performance, or instructor ratings of academic achievement [ 40 ]. In our investigation, we reported that up to 20% of participants considered leaving their schools due to the adverse effects of cyberbullying (cOR 3.0) and wished they could stop using the Internet; 26% of participants felt that their school performance was affected due to being cyber victims (cOR 6.5). The results of the univariate analysis showed a high odd ratio related to school performance and a willingness to leave school. This conclusion indicates the likelihood of these impacts specifically with a significant p-value, as shown in Table 5 .

In this study, approximately 88% of the participants were cyber victims compared to only 11% of cyberbullying perpetrators who committed this act on their peers. Mental health affection is well-reported in many studies on cyber victims with higher depression rates than cyberbullying perpetrators [ 41 , 42 ]. However, other studies indicate that cyberbullying victims are not the only ones affected; harm is also extended to involve perpetrators. Cyberbullying perpetrators have high-stress levels, poor school performance, and an increased risk of depression and alcohol misuse. Furthermore, research shows that adolescents who were victims or perpetrators of cyberbullying in their adolescence continue to engage in similar behavior into early adulthood [ 43 , 44 ].

Limitations of the study

Although the current study found a high prevalence and positive connections among variables, it should be emphasized that it was conducted on a determinate sample of respondents, 11 to 18 years old. Therefore, the results could not be generalized for other samples, age groups, and communities from other cultures and contexts. In addition, it was limited to adolescent survey responses, did not include parents' and caretakers' viewpoints, and failed to include other risk factors such as divorce and financial status. We believe future studies should consider parents' perspectives and more analysis of perpetrators' characteristics. Moreover, self-reported tools are susceptible to social desirability bias, which can influence test item responses. As a result, future research should employ a variety of monitoring and evaluation metrics and larger potential populations and age ranges. Another limitation of this analysis is that we cannot make conclusive inferences regarding gender and exact prevalence because male adolescents had a lower response rate than female adolescents, suggesting that males might be more sensitive to disclosing these issues.

Even though experts in the social sciences typically research cyberbullying, it is crucial to investigate it from a clinical perspective because it significantly affects mental health. Adolescents' lives have grown increasingly centered on online communication, which provides several possibilities for psychological outcomes and aggressive actions such as cyberbullying. Stress, anxiety, depressive symptoms, suicidal ideation, and deterioration in school performance are all linked to cyberbullying. Therefore, we emphasize the need for parents and educators to be conscious of these dangers and be the first line of protection for the adolescent by recognizing, addressing, and solving this problem. Furthermore, we urge the responsibility of pediatricians, physicians, and psychiatric consultants to create a comfortable atmosphere for adolescents to disclose and report this problem early and raise awareness of the problem in their communities. Furthermore, practical strategies for dealing with such occurrences involving health, education, and law authorities, should be supported to tackle this problem, which can affect the adolescent mentally and academically. Lastly, to decide how to intervene most effectively, more research must be done on the many methods to assess how schools, communities, and healthcare providers tackle cyberbullying.

Availability of data and materials

The authors ensure that the data supporting the results of this study are available within the article. The raw data for the study will be obtainable from the corresponding author upon reasonable demand.

Krešić Ćorić M, Kaštelan A. Bullying through the Internet - Cyberbullying. Psychiatr Danub. 2020;32(Suppl 2):269–72.

Google Scholar  

Englander E, Donnerstein E, Kowalski R, Lin CA, Parti K. Defining Cyberbullying. Pediatrics. 2017;140(Suppl 2):148–51. https://doi.org/10.1542/peds.2016-1758U .

Article   Google Scholar  

Bottino SM, Bottino CM, Regina CG, Correia AV, Ribeiro WS. Cyberbullying and adolescent mental health: a systematic review. Cad Saude Publica. 2015;31(3):463–75. https://doi.org/10.1590/0102-311x00036114 .

Martín-Criado JM, Casas JA, Ortega-Ruiz R. Parental Supervision: Predictive Variables of Positive Involvement in Cyberbullying Prevention. Int J Environ Res Public Health. 2021;18(4):1562. https://doi.org/10.3390/ijerph18041562 .

Uludasdemir D, Kucuk S. Cyber Bullying Experiences of Adolescents and Parental Awareness: Turkish Example. J Pediatr Nurs. 2019;44:84–90. https://doi.org/10.1016/j.pedn.2018.11.006 .

Jaffer M, Alshehri K, Almutairi M, Aljumaiah A, Alfraiji A, Hakami M, Al-Dossary M, Irfan T. Cyberbullying among young Saudi online gamers and its relation to depression. J Nat Sci Med. 2021;4(2):142–7. https://doi.org/10.4103/JNSM.JNSM_78_20 .

Cyberbullying: An Analysis of Data from the Health Behaviour in School-aged Children (HBSC) Survey for England; 2014. Available from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/621070/Health_behaviour_in_school_age_children_cyberbullying.pdf [Last accessed on 2022 Nov 09].

LI Q. Bullying in the new playground: Research into cyberbullying and cyber victimisation. Australasian J Educ Tech. 2007;23:435–54.

Rao J, Wang H, Pang M, et al. Cyberbullying perpetration and victimisation among junior and senior high school students in Guangzhou. China Inj Prev. 2019;25(1):13–9. https://doi.org/10.1136/injuryprev-2016-042210 .

Samples-Kanyinga H, Lalande K, Colman I. Cyberbullying victimisation and internalising and externalising problems among adolescents: the moderating role of parent-child relationship and child’s sex. Epidemiol Psychiatr Sci. 2018;29:8. https://doi.org/10.1017/S2045796018000653 .

Kowalski RM, Giumetti GW, Schroeder AN, Lattanner MR. Bullying in the digital age: a critical review and meta-analysis of cyberbullying research among youth [published correction appears in Psychol Bull. 2014; 140(4): 1073–1137. https://doi.org/10.1037/a0035618

Tokunaga RS. Following you home from school: A critical review and synthesis of research on cyberbullying victimization. Comput Hum Behav. 2010;26(3):277–87.

Pichel R, Foody M, O’Higgins Norman J, Feijóo S, Varela J, Rial A. Bullying, Cyberbullying and the Overlap: What Does Age Have to Do with It? Sustainability. 2021;13(15):8527.

Shin SY, Choi Y-J. Comparison of Cyberbullying before and after the COVID-19 Pandemic in Korea. Int J Environ Res Public Health. 2021;18(19):10085. https://doi.org/10.3390/ijerph181910085 .

Article   CAS   Google Scholar  

Cassidy W, Jackson M, Brown KN. Sticks and stones can break my bones, but how can pixels hurt me?: Students’ experiences with cyberbullying. Sch Psychol Int. 2009;30:383–402. https://doi.org/10.1177/0143034309106948 .

Bonanno RA, Hymel S. Cyberbullying and internalizing difficulties: above and beyond the impact of traditional forms of Bullying. J Youth Adolesc. 2013;42(5):685–97. https://doi.org/10.1007/s10964-013-9937-1 .

Peebles E. Cyberbullying: Hiding behind the screen. Paediatr Child Health. 2014;19(10):527–8. https://doi.org/10.1093/pch/19.10.527 .

Landstedt E, Persson S. Bullying, cyberbullying, and mental health in young people. Scandinavian Journal of Public Health. 2014;42(4):393–9. https://doi.org/10.1177/1403494814525004 .

Al-Zahrani, A. M. . Cyberbullying among Saudi’s Higher-Education Students: Implications for Educators and Policymakers. World Journal of Education.2015; 5(3). https://doi.org/10.5430/WJE.V5N3P15

Allehyani SH. Cyberbullying and It's Impact on The Saudi Kindergarten Children. Journal of Arts, Literature, Humanities, and Social Sciences. 2018;4(22):307-329. https://doi.org/10.33193/1889-000-022-016 .

Ortega R, Elipe P, Mora-Merchan JA, Genta ML, Brighi A, Guarini A, et al. The emotional impact of bullying and cyberbullying on victims: a European Cross-National Study. Aggress Behav. 2012;38:342–56.

Kowalski RM, Limber SP. Psychological, physical, and academic correlates of cyberbullying and traditional Bullying. J Adolesc Health.2013; 53(1Suppl):13–20. doi: https://doi.org/10.1016/j.jadohealth.2012.09.018

Chen Q, Lo CKM, Zhu Y, Cheung A, Chan KL, Ip P. Family poly-victimization and cyberbullying among adolescents in a Chinese school sample. Child Abuse Negl. 2018;77:180–7. https://doi.org/10.1016/j.chiabu.2018.01.015 .

María Rivera-Riquelme, Jose A Piqueras, Pim Cuijpers. The Revised Mental Health Inventory-5 (MHI-5) as an ultra-brief screening measure of bi-dimensional mental health in children and adolescents. Psychiatry Research. 2019; 274:247–253. https://doi.org/10.1016/j.psychres.2019.02.045 .

van den Beukel TO, et al. Comparison of the SF-36 Five-item Mental Health Inventory and Beck Depression Inventory for the screening of depressive symptoms in chronic dialysis patients. Nephrol Dial Transplant. 2012;27(12):4453–7.

Smith PK, Mahdavi J, Carvalho M, Fisher S, Russell S, Tippet N. Cyberbullying: Its nature and impact in secondary school pupils. J Child Psychol Psychiat. 2008;49:376–85. https://doi.org/10.1111/j.1469-7610.2007.01846.x .

Selkie EM, Fales JL, Moreno MA. Cyberbullying prevalence among United States middle and high school aged adolescents: a systematic review and quality assessment. J Adolesc Health. 2016;58:125–33. https://doi.org/10.1016/j.jadohealth.2015.09.026 .

Hamm MP, Newton AS, Chisholm A, et al. Prevalence and Effect of Cyberbullying on Children and Young People: A Scoping Review of Social Media Studies. JAMA Pediatr. 2015;169(8):770–7. https://doi.org/10.1001/jamapediatrics.2015.0944 .

Zhang Y, Xu C, Dai H, Jia X. Psychological Distress and Adolescents’ Cyberbullying under Floods and the COVID-19 Pandemic: Parent-Child Relationships and Negotiable Fate as Moderators. Int J Environ Res Public Health. 2021;18(23):12279. https://doi.org/10.3390/ijerph182312279 .

Barlett CP, Simmers MM, Roth B, Gentile D. Comparing cyberbullying prevalence and process before and during the COVID-19 pandemic. J Soc Psychol. 2021;161(4):408–18. https://doi.org/10.1080/00224545.2021.1918619 .

Al Qudah MF, Al-Barashdi HS, Hassan EMAH, et al. Psychological Security, Psychological Loneliness, and Age as the Predictors of Cyber-Bullying Among University Students. Community Ment Health J. 2020;56(3):393–403. https://doi.org/10.1007/s10597-019-00455-z .

Hinduja S, Cyberbullying in 2021 by Age, Gender, Sexual Orientation, and Race. https://cyberbullying.org/cyberbullying-statistics-age-gender-sexual-orientation-race .

Przybylski AK, Bowes L. Cyberbullying and adolescent well-being in England: a population-based cross-sectional study. Lancet Child Adolesc Health. 2017;1:19–26. https://doi.org/10.1016/S2352-4642(17)30011-1 .

O’Connor RC, Rasmussen S, Miles J, Hawton K. Self-harm in adolescents: self-report survey in schools in Scotland. Br J Psychiatry. 2009;194(1):68–72.

John A, Glendenning AC, Marchant A, Montgomery P, Stewart A, Wood S, Lloyd K, Hawton K. Self-harm, suicidal Behaviours, and Cyberbullying in children and young people: a systematic review. J Med Internet Res. 2018;20(4): e129.

Nguyen HTL, Nakamura K, Seino K, et al. Relationships among cyberbullying, parental attitudes, self-harm and suicidal behavior among adolescents: results from a school-based survey in Vietnam. BMC Public Health. 2020;20:476. https://doi.org/10.1186/s12889-020-08500-3 .

Alfakeh SA, Alghamdi AA, Kouzaba KA, Altaifi MI, Abu-Alamah SD, Salamah MM. Parents’ perception of cyberbullying of their children in Saudi Arabia. J Family Community Med. 2021;28(2):117–24. https://doi.org/10.4103/jfcm.JFCM_516_20 .

Kim S, Boyle MH, Georgiades K. Cyberbullying victimization and its association with health across the life course: a Canadian population study. Can J Public Health. 2018;108:468–74.

Fahy AE, Stansfeld SA, Smuk M, et al. Longitudinal associations between cyberbullying involvement and adolescent mental health. J Adolesc Health. 2016;59:502–9.

Gardella JH, Fisher BW, Teurbe-Tolon AR. A Systematic Review and Meta-Analysis of Cyber-Victimization and Educational Outcomes for Adolescents. Rev Educ Res. 2017;87(2):283–308. https://doi.org/10.3102/0034654316689136 .

Nansel TR, Craig W, Overpeck MD, Saluja G, Ruan WJ. Cross-national consistency in the relationship between bullying behaviors and psychosocial adjustment. Arch Pediatr Adolesc Med. 2004;158(8):730–6.

Sourander A, Brunstein Klimek A, Ikonen M, et al. Psychosocial risk factors associated with cyberbullying among adolescents: a population-based study. Arch Gen Psychiatry. 2010;67(7):720–8.

Daniela Š, Ivana D, Marija M. Psychological Outcomes of Cyber-Violence on Victims, Perpetrators and Perpetrators/Victims. Hrvat Rev Za Rehabil Istraz. 2017;53:98–110.

Selkie EM, Kota R, Chan Y-F, Moreno M. Cyberbullying, depression, and problem alcohol use in female college students: a multisite study. Cyberpsychology Behav Soc Netw. 2015;18(2):79–86.

Download references

Acknowledgements

We want to acknowledge the help and appreciate the efforts of the participating students and their guardians during data collection.

Author information

Authors and affiliations.

Pediatric Department, Faculty of Medicine, Jazan University, Jazan, Saudi Arabia

Gassem Gohal & Ebtihal Eltyeb

Family and Community Medicine Department, Faculty of Medicine, Jazan University, Jazan, Saudi Arabia

Ahmad Alqassim & Mohamed Mahfouz

Medical Intern, Faculty of Medicine, Jazan University, Jazan, Saudi Arabia

Ahmed Rayyani, Bassam Hakami, Abdullah Al Faqih, Abdullah Hakami & Almuhannad Qadri

You can also search for this author in PubMed   Google Scholar

Contributions

GG, EE and AA did the study design, data collection, statistical analysis manuscript writing, editing, revision, approved final manuscript, and responsible for integrity of research.

AR, BH, AF, AH, AQ, and MM contributed in data collection, statistical analysis, manuscript writing, editing, revision, approved final manuscript.

Corresponding author

Correspondence to Ahmad Alqassim .

Ethics declarations

Ethics approval and consent to participate.

The ethical approval for this study was obtained from The Institute Review Board (IRB) of Jazan University (Letter v.1 2019 dated 08/04/2021). Informed consent was received from all participants, and for participants under age 16, informed consent was obtained from a parent and legal guardian. All methods were carried out under relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors state that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Gohal, G., Alqassim, A., Eltyeb, E. et al. Prevalence and related risks of cyberbullying and its effects on adolescent. BMC Psychiatry 23 , 39 (2023). https://doi.org/10.1186/s12888-023-04542-0

Download citation

Received : 05 August 2022

Accepted : 11 January 2023

Published : 14 January 2023

DOI : https://doi.org/10.1186/s12888-023-04542-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cyberbullying
  • Psychological effects
  • Adolescents
  • Public health
  • Mental Health
  • Saudi Arabia

BMC Psychiatry

ISSN: 1471-244X

data presentation about cyberbullying

  • Japanese JA
  • Internet providers
  • Cyberbullying Statistics

Cyberbullying data, facts and statistics for 2018 – 2024

Cyberbullying facts and statistics

*This list of cyberbullying statistics from 2018-2024 is regularly updated with the latest facts, figures, and trends.

Internet connectivity is important because it provides both educational and social benefits for young people. Unfortunately, these positive attributes are counterbalanced by potentially dangerous consequences.

Alongside improving communication and democratizing access to information, the internet lets people conceal themselves behind a mask of anonymity. This creates a whole new set of risks for children – and often adults too.

The internet creates not only a threat for teens who could fall victim to cyberbullying – but also the potential for children to engage in online crimes, trolling, and cyberbullying themselves. That makes cyberbullying a topic that all parents and  guardians need to be aware of.

Schools, governments, and independent organizations are attempting to raise awareness of cyberbullying and online stalking, but the cyberbullying statistics in this article illustrate that the problem is not going away anytime soon.

Cyberbullying around the world

In 2018, Ipsos surveyed adults in 28 countries, creating one of the largest-scale studies on cyberbullying to date. Unfortunately, it hasn’t repeated this survey since, but the old data is nonetheless eye-opening and informative.

In total 20,793 interviews were conducted between March 23 – April 6, 2018, among adults aged 18-64 in the US and Canada, and adults aged 16-64 in all other countries.

Of particular interest are Russia and Japan. In both countries, parents expressed extremely high levels of confidence that their children did not experience cyberbullying of any kind.

Meanwhile, Indian parents remained among the highest to express confidence that their children were cyberbullied at least sometimes, a number that only grew from 2011 to 2018. Across Europe and the Americas, it also appears more parents are either becoming aware of their children’s negative experiences with cyberbullying, or their children are increasingly experiencing such attacks online.

Country201820162011
India373232
Brazil291920
United States263415
Belgium251312
South Africa262510
Malaysia23----
Sweden232014
Canada201718
Turkey20145
Saudi Arabia191718
Australia192013
Mexico18208
Great Britain181511
China172011
Serbia16----
Germany1497
Argentina14109
Peru1413--
South Korea1398
Italy12113
Poland121812
Romania11----
Hungary10117
Spain9105
France975
Chile8----
Japan577
Russia195

Global perspectives on cyberbullying

The following chart includes additional perspectives and insight into cyberbullying from a global scale, including:

  • Percent of respondents aware of cyberbullying as a concept
  • Number of countries responding where specific anti-bullying laws exist
  • Respondents who believe current laws are enough to handle cyberbullying cases.

Cyberbullying facts and statistics for 2018-2024

1. 60 percent of parents with children aged 14 to 18 reported them being bullied in 2019.

More parents than ever report that their children are getting bullied both at school or online. Comparitech conducted a survey of over 1,000 parents with at least one child over the age of 5.

  • 47.7% of parents with children ages 6-10 reported their children were bullied
  • 56.4% of parents with children ages 11-13 reported their children were bullied
  • 59.9% of parents with children ages 14-18 reported their children were bullied
  • 54.3% of parents with children ages 19 and older reported their children were bullied

Bullying statistics infographic

2. One-fifth of all bullying occurs through social media

Although the vast majority of parents reported bullying in school, 19.2% stated that bullying occurred through social media sites and apps. A further 11% indicated bullying occurred through text messages, while 7.9% identified video games as a source. Meanwhile, 6.8% reported bullying occurred on non-social media websites, while 3.3% indicated the bullying occurred through email.

Some parents even witnessed cyberbullying occur, with 10.5% of parents indicating they observed the cyberbullying themselves.

Interestingly, subsequent research has indicated that you don’t even have to have internet access to be affected — Guimetti et al. (2020) found a postive correlation between time spent using a cellphone (but  not  online) and the likelihood of cyberbullying victimization.

3. Attitudes regarding the pandemic and lockdowns directly contributed to cyberbullying

A study written by scholars working at the Universities of Florida and Denver revealed that the global pandemic had a marked effect on cyberbullying levels on Twitter . According to that study, the analysis of 454 , 046 publicly available tweets related to cyberbullying revealed a direct correlation between the pandemic and cyberbullying incidents.

According to Verywell , that increase was due in part to the extra leisure time and online presence that children had due to lockdown and online schooling. A report from Common Sense Media indicated that children and teens spent around 17 percent more time on social media sites due to the pandemic.

Psychological reasons, including self-preservation and self-defense behaviors, have also been cited (by Verywell) as a possible causes for the sudden rise in cyberbullying and online toxicity during the pandemic.

4. Most parents respond proactively after their children are cyberbullied

There are a large number of ways parents can respond to cyberbullying, but it appears the most common response is to talk to children about online safety.

Comparitech found 59.4% of parents talked to their children about internet safety and safe practices after cyberbullying occurred. Parents may need to take more steps to intervene, however, as only 43.4% identified adjusting parental controls to block offenders, only 33% implemented new rules for technology use, and only 40.6% saved the evidence for investigators.

Very few parents (just 34.9%) notified their child’s school about cyberbullying. And a small number (10.4%) took the nuclear option and completely took away their child’s technology in response.

5. Most teens have now experienced cyberbullying in some way

] Cyberbullying: A Narrative Review (Grover et al ., 2023) notes that it’s difficult to accurately define how common cyberbullying is because incidence rates vary based on location, the victim’s age, number of occurences, and even disagreement over what constitutes online bullying. Still, after reviewing the existing literature, it estimates that the average victimization rate is around 21 percent.

A 2022 Pew Research study found that nearly half of all teens (49%) had experienced some form of cyberbullying. The most common type was offensive name calling, but one in ten had also received physical threats.

Another study from 2021 shows that this isn’t unique to teens , with around 40 percent of Americans under 30 having experienced online harassment. Of these, 50% identified politics as the reason behind the incident.

Among teens, the most common specific types of cyberbullying include:

  • Offensive name-calling (32 percent)
  • Spreading of false rumors (22 percent)
  • Receiving unsolicited explicit images (17 percent)
  • Repeated requests for their location or whereabouts (15 percent)
  • Physical threats (10 percent)
  • Having explicit images of them shared without their consent (7 percent)

Common types of cyberbullying stats

6. Self-reported data gives mixed results

According to the Cyberbullying Research Center , which has been collecting data on the subject since 2007, an average of 29.3% of middle school and high school students report being cyberbullied. This is an increase of 1.5 percent since 2022, though this may be due to the return to in-person learning following the end of the COVID-19 pandemic.

The differences in the reported number of victims between the Pew Research Center and Cyberbullying Research Center are stark, but present an inherent problem with self-reported data related to cyberbullying. Because of the difficulty of gathering data and the inconsistencies in how respondents will answer questions (as well as differences in how and in what format questions are asked), it’s hard to pin down the exact number of young adults who have been cyberbullied at some point in their lives.

The problem could be more or less serious than either research center states.

7. Google Trends data reveals increasing patterns about cyberbullying

Google Trends data indicates much more attention is focused on cyberbullying than ever before. The volume of worldwide searches for “cyberbullying” increased threefold since 2004:

Here’s something interesting: Google searches for “cyberbullying” tend to reach their high point during the middle of the school term before steeply dropping off during the summer and festive season. This applies in both the US and the UK, implying that incidents become more rare the closer to the holidays we get.

Despite this pattern continuing for several years, there was a notable reduction in searches for “cyberbullying” in Fall 2020. This may be due to the large amounts of upheaval in student’s lives as a result of the COVID-19 pandemic and switch to online learning, but without further data, it’s difficult to say for certain. All we know is that since this initial dip, search traffic seems to have returned to its usual pattern.

Screenshot of Google Trend's US results for "cyberbullying" 2024

8. Cyberbullying may be contributing to the increase in youth suicides

In the United States, suicide is one of the leading causes of mortality for people between the ages of 10 and 44. According to the CDC, there were 13.7 such incidents per 100,000 citizens in 2021, with rates remaining fairly similar across all regions of the country.

Screenshot of CDC graph showing the leading causes of mortality in the us for 2021

Using the CDC’s online WONDER system, we can tell that there were 23,844 victims of suicide in this demographic that year. Of these, 7,134 victims (29.9 percent) were between the ages of five and 24.

Although the CDC data does not suggest a reason for the increase in suicides, cyberbullying may indeed be part of the equation. A 2022 study from the Lifespan Brain Institute concluded that being a cyberbullying victim corresponds with increased incidence of suicidal thought, though being a perpetrator does not. This mirrors a 2018 study which found that young adults under the age of 25 who were victimized by cyberbullying were twice as likely to commit suicide or self-harm in other ways.

9. Bullying has surprising impacts on identity fraud

It appears bullying has effects beyond self-harm. Javelin Research finds that children who are bullied are 9 times more likely to be the victims of identity fraud as well.

cyberbullying Statistics Javelin

10. Young adults remain split on content moderation

A 2021 study from the UK anti-bullying organization Ditch the Label found that over 40 percent of people under 25 years old aren’t sure whether social media platforms should be more tightly moderated. Around a third would like to see increased moderation, with 15 percent of respondents being against this move.

11. Most young adults believe cyberbullying is not normal or acceptable behavior

Unfortunately, Ditch the Label changes its questions every year, making it difficult to track changes in attitude over time. Still, this does help it cover a wider range of topics. For instance, its 2017 survey found 77% of young adults do not consider bullying to be simply “part of growing up”.  Most (62%) also believe hurtful online comments are just as bad as those made offline. And in a nod to the idea that celebrities are still human, 70% strongly disagree with the idea that its ok to send nasty tweets to famous personalities.

All the same, personal perspectives on how to treat others don’t always result in positive behavior. Hypocrisy tends to rule the day, as the Ditch the Label survey also found that 69% of its respondents admitted to doing something abusive to another person online. One study found that adolescents who engaged in cyberbullying were more likely to be perceived as “popular” by their peers.

12. Cyberbullying extends to online gaming, as well

Social media tends to eat up most of the attention related to cyberbullying, but it can occur across any online medium, including online gaming.  In one survey, 90 percent of gamers reported experiencing cyberbullying in-game, with racism, hate speech, and extremist content extremely common.

Meanwhile, a survey of over 2,000 adolescents found that over one-third experienced bullying in mobile games . And a 2020 Ditch the Label survey of over 2,500 young adults found 53% reported to be victims of bullying in online gaming environments, while over 70% believe bullying in online games should be taken more seriously.

Online gaming bullying can extend beyond just hurtful words. It can also include the dangerous activity known as swatting , in which perpetrators locate the home address of the victim and make a false criminal complaint to the victim’s local police, who then “send in the SWAT team” as a response. Swatting has resulted in the shooting death of innocent victims , making it a particularly troubling practice more commonly associated with the gaming community.

13. Cell phone bans in school don’t prevent cyberbullying

In early 2019, the National Center for Education Statistics (NCES) released data showing that schools where cell phones were disallowed also had a higher number of principal-reported cases of cyberbullying.

14. Cyberbullying impacts sleeping habits

A 2019 study found teens who were cyberbullied were also more likely to suffer from poor sleep and depression . This finding was echoed in Ditch the Label’s 2020 report, in which 36 percent of respondents reported feeling depressed. Around one in ten respondents to Ofcom’s 2022 media usage study was able to locate information or tools to help them sleep online, but the vast majority either didn’t look or couldn’t find anything useful.

15. Being connected to peers and family helps reduce cyberbullying

According to a 2022 Ofcom study , around 45 percent of British parents trust their child to be responsible about the content they consume online, rather than relying on technical restrictions. Around half of the respondents replied that they checked in with their child about their browsing habits every few weeks, with only five percent having this conversation once and never again.

However, with 46 percent of teens offering help with the internet to relatives every week, there are clearly limits on how much family can help without a better understanding of social media platforms and their protection tools (or lack thereof).

Other research indicates that forming stronger bonds with their kids could be an effective way to help prevent bullying.  An online survey of South Australian teens aged 12-17 found that social connectedness significantly helped reduce the impact of cyberbullying.

And considering roughly 64% of students who claimed to have been cyberbullied explained that it negatively impacted both their feelings of safety and ability to learn at school, an increase in social connectedness could make a significant impact on students’ comfort in the classroom.

16. Female and LGTBQ+ cyberbullying victims are common

Girls are almost twice as likely to be victims of cybercrime while boys are more likely to be cyberbullies. There’s significant cross-over between in-person and online bullying. Researchers found 83% of students who had been bullied online in the last 30 days had also been bullied at school. Meanwhile, 69% of students who admitted to bullying others online had also recently bullied others at school.

Research also indicates that those who identify as LGBTQ+ face more significant bullying than those who identify as heterosexual. The consequences of this kind of treatment also lead to an increased rate of suicide among some LGBTQ communities and may result in decreased educational attainment.

  • Almost 40 percent of LGBTQ teens were cyberbullied in 2021, compared to 15.9% of their heterosexual peers. ( Source: CDC )
  • Between 2019 and 2021, Ditch the Label found more than 260 million instances of hate speech online.
  • Online transphobic hate speech is now up 28 percent from 2020.
  • A larger number of LGBTQ teens (18%) report not attending schools to avoid bullying, compared to 9.7 percent of heterosexual teens. ( Source: CDC )
  • Black LGTBQ youth are more likely to face mental health issues due to cyberbullying when compared to non-black LGTBQ youth and youth who identify as heterosexual.  An American University study of CDC data found 56% of black LGTBQ youth are at risk for depression. ( Source : American University )
  • American University found 38% had suicidal thoughts within the past year, compared to heterosexual youth. ( Source : American University )
  • A 2018 study found that LGBTQ youth experienced increased cyber victimization as they aged, while heterosexual youth did not. ( Source : Computers in Human Behavior )
  • A study of 1,031 adolescents found that sexual orientation strongly correlates with cyberbullying involvement or negative mental health symptoms. ( Source : Journal of Child & Adolescent Trauma )

See also: Preventing LGBTQ+ cyberbullying

17. Vulgar words used by social media users could help to identify perpetrators

The International Journal on Adv. Science Engineering IT found Twitter users who regularly use vulgar words in their Tweets are more likely to be behind some form of cyberbullying versus users who avoid the use of vulgar words.

18. The shocking reality of children impersonating others

A Digital Citizenship report from the Cyberbullying Research Centre surveyed 2,500 US students aged 12-17. It showed that 9% of those surveyed admitted to pretending to be someone else online.

Table showing cyberbullying impersonation stats

19. More children avoid school because of cyberbullying

While traditionally you’d hear of children skipping school because of physical bullying, a poll by UNICEF found that one in five children haven’t turned up at school due to threats associated with cyberbullying.

20. Over half of the victims of online harassment know the cyberbully

Verywellfamily reports that over 64% of online harassment victims know the perpetrator from in-person encounters. Even when the cyberbully knows their victim in person, they often resort to upsetting them online by mocking their photos and leaving malicious comments. 25% of respondents say they encountered trolling in video games.

21. YouTube is one of the worst places where cyberbullying occurs

While most parents might consider YouTube a relatively harmless web service for their children to use, the reality is that the comments section under videos is rife with trolling and cyberbullying. Around 79 percent of children who use YouTube have experienced cyberbullying, which leads to stressful interactions on the video platform.

Meanwhile, around 50 percent of young people on Facebook experience cyberbullying. That’s still far too high, but lower than the 64 percent of victims on TikTok and 69 percent on Snapchat.

22. Adults are also victims

While it is vital to protect young people against cyberbullying and cyberstalking, it is also important to remember that this problem also affects many adults. According to PEW research from 2021, over 40% of adults have experienced cyberbullying and harassment online. This behavior often leads to stress and anxiety, which are leading causes for mental health issues.

23. As of 2019, Greece has the lowest cyberbullying rates

According to the Organisation for Economic Co-operation (OECD), Greece has the lowest cyberbullying rates with only 5% of adolescents reporting that they have been victimized by bullying online.

The most significant cyberbullying rates were found in Latvia, where 25% of people reported cyberbullying. Latvia was closely followed by Estonia, Hungary, Ireland, and the United Kingdom where around 20% of adolescents reported cyberbullying.

24. Algorithms can help make people nicer

According to the latest research at Yale Law School, warnings issued automatically by algorithms can help to deter rudeness and cyberbullying.

The researchers looked at posts on Twitter that resulted in a prompt that said, “Want to review this before tweeting?” The study found that users often decided to alter their posts when asked to consider their content.

This reveals that simply being asked to ponder whether a post might be rude, offensive, upsetting, or unnecessary is enough to cause netizens to voluntarily alter their post to make them nicer.

The study even found that being asked to consider a post’s tone helped those social media users remain nicer in subsequent posts too!

25. UK considering rules to allow social media users to block anonymous accounts

In 2022 the UK government announced that it is considering new regulations that would allow users to block contact with any social media user who has not verified their account with a form of ID.

The government hopes that this would allow users to cut themselves off from trolls. However, it could also cause privacy concerns by forcing users to provide an ID in order to be able to communicate with other users on social media.

A need for more broad-reaching and open research

One common theme emerged as we researched various aspects of cyberbullying—a stunning lack of data. This is not to say that research on cyberbullying isn’t there. Even a simple search in research databases will reveal thousands of articles covering the topic in some form. However, most research on cyberbullying is either small in scale or lacking in depth. Most research is also based on surveys, resulting in a large variation in the results from survey to survey.

The Florida Atlantic University study represents one of the best sources of information to date. However, more is needed, including a meta-analysis of the data gathered from many other sources. Until then, publicly available cyberbullying statistics paint an incomplete picture of the ongoing issue.

Past research still holds value

Despite a lack of consistent publicly or easily-accessible data, a plethora of data from before 2015 can still help shed some valuable light on the issue. Past research and statistics reveal where cyberbullying has been and help reflect on why this issue is still a concern today.

Older data on cyberbullying include the following:

  • Most teenagers (over 80%) now use a mobile device regularly, opening them up to new avenues for bullying. ( Source: Bullying Statistics )
  • Half of all young adults have experienced cyberbullying in some form. A further 10-20% reported experiencing it regularly. ( Source: Bullying Statistics )
  • Cyberbullying and suicide may be linked in some ways. Around 80% of young people who commit suicide have depressive thoughts. Cyberbullying often leads to more suicidal thoughts than traditional bullying. ( Source : JAMA Pediatrics )
  • Almost 37 percent of kids have been cyberbully victims. Around 30 percent have been victimized more than once. ( Source : DoSomething.org )
  • 81% of students said they’d be more likely to intervene in cyberbullying if they could do so anonymously. ( Source : DoSomething.org )
  • A UK survey of more than 10,000 youths discovered that 60% reported witnessing abusive online behavior directed toward another person. ( Source : YoungMinds.org )
  • The same U.K. survey also discovered that 83% of young adults believe social networks do not do enough to prevent cyberbullying. ( Source : DoSomething.org )

Looking for more internet-related stats? Check out our roundup of  identity theft stats and facts for 2017-2023 , or our Cybercrime statistics which runs to 100+ facts and figures.

28 Comments Leave a comment

I see that the US has laws against cyberbullying, do you know where I would find those or how often they are actually enforced? Are there any statistics on the number of teens, and maybe even the ages, that are victims of identity theft? Also, are there any statistics on the number of suicides that were due to bullying or cyberbullying? Thanks for the information as well. I think too many people in this world will do anything, no matter who it hurts, just so they can feel better about themselves. I think most bullies are just insecure. I had someone burn my home down because of their insecurities, so I know people are crazy insecure. I just wish I realized it sooner. My son has said some things to me at different times about another person (when he was mad) that could be seriously detrimental to their well-being. I always stop what I am doing and make sure he hears what I have to say and understand the seriousness of the possible consequences to anyone for his words, then tell him the consequences to him if I EVER hear him say/do anything like that again. I think a huge part of putting a stop to bullying of any kind is up to the parent/guardian. If we don’t show our kids how to behave and what is acceptable, how will they know?

Hi Jennifer!

You may have some difficulty finding out how frequently US cyberbullying laws are enforced. The laws are not federal or centralized. They exist separately and unique in different states and municipalities. Data might exist across those states, but it may not be consistent.

You can find some information on the number of people under the age of 16 who are victims of certain types of ID theft, but not all types. For example, the FTC keeps some records on child identity fraud, though it’s not likely representative of exactly how widespread the issue is since it’s based on self-reported data.

Thank you for your personal diligence as a parent.

In the first graph, it shows some white countries. I am currently saying that 15 countries have any anti-cyberbullying programs, but I want to make sure I’m accurate, so what do I say?

Hi! Sorry for the late response. Our data shows 16 countries have cyberbullying laws in place: Japan Spain Chile France Peru Argentina Italy Hungary South Korea Mexico Sweden Australia Brazil Canada Great Britain Poland United States

Another 3 have “partial” laws in place, meaning there’s either been discussion of such laws, or they have laws that cover cyberbullying generally, but don’t specifically mention cyberbullying in the law.

Hungary China Saudi Arabia

Hope that helps!

And to answer your other question, the countries that are greyed are countries where no information exists, so your assumption on that is correct.

And in case you get an email, I edited the list a bit as I realized I copied too many rows. It should read:

Japan Spain Chile France Peru Argentina Italy Hungary South Korea Mexico Sweden Australia Brazil Canada Great Britain Poland United States

(Countries with laws)

(Countries with partial laws)

Thank you so much! This greatly helped me, and will remember to forward your awesome work to others. Do you by chance have any other articles like this?

Glad to be of help! We don’t have too many that would be along a similar topic, but this one might help: https://www.comparitech.com/internet-providers/technology-internet-addiction/

Hey I was curious, when I quote you from this (and because you used listed sources) do I also have to recite where the information was originally from? Like for example say “your quote here,” source (the source). Or do I just have to put your quote? I’m only 14 and found this paper extremely useful!

Thanks for using our resource!

You have a few avenues you might explore with this.

1. Cite it like you would any other web page, using the proper format for that. 2. Cite the original sources. We’ve listed all of the original sources for each stat/fact listed. 3. Cite it like you would a Wikipedia page (which is consequently basically the same as #1).

The important thing is to cite at all, so I’m glad you’re doing that! I’m not sure whether you’re using APA or MLA, but there are several good resources to help you create the proper citations, including http://www.citationmachine.net/ . Best of luck!

I am using this as a source for my annotated bib. research paper, what day was this pulished

Hi Tyrin! Are you using APA citation format? If so, you’ll only need the mostly recent publish date (last update) which is June 22, 2019, and not the original publication date for the post. Additionally, if using APA, make sure to include date accessed!

Thank you! Also can you give me information about the author author, R. Bradley-Andresoiu if you don’t mind?

You can find author bio information for all Comparitech writers (including mine) on the Author’s page of the site: https://www.comparitech.com/authors/

Great info. Really interesting to see some of these responses and statistics.

I’d really like to use your graphs. May I use it for my academy workshop support? I hardly found these

Any of the graphs you find here can be re-used, just make sure to include proper citation.

I need the info for schol so thanks

Interesting articles!

This is a great article with great numbers to use in an essay! Thank you!

We’re happy to help!

Great information! Using some of these statistics for a debate on whether or not cyber bullying should be criminalized!

Glad to be of help!

Very useful for my school essay, thanks!

Glad we could help! As a former teacher, I’m conflicted though. Try not to procrastinate in the future 😉

thx for this i have a school essay so thx

Very glad I could be of help!

Cyber bullying needs to stop no joke the rates are going up

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed .

U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

StopBullying.gov

  • Cyberbullying

What Is Cyberbullying

Print

Cyberbullying is bullying that takes place over digital devices like cell phones, computers, and tablets. Cyberbullying can occur through SMS, Text, and apps, or online in social media, forums, or gaming where people can view, participate in, or share content. Cyberbullying includes sending, posting, or sharing negative, harmful, false, or mean content about someone else. It can include sharing personal or private information about someone else causing embarrassment or humiliation. Some cyberbullying crosses the line into unlawful or criminal behavior.

The most common places where cyberbullying occurs are:

  • Social Media, such as Facebook, Instagram, Snapchat, and Tik Tok
  • Text messaging and messaging apps on mobile or tablet devices
  • Instant messaging, direct messaging, and online chatting over the internet
  • Online forums, chat rooms, and message boards, such as Reddit
  • Online gaming communities

Special Concerns

With the prevalence of social media and digital forums, comments, photos, posts, and content shared by individuals can often be viewed by strangers as well as acquaintances. The content an individual shares online – both their personal content as well as any negative, mean, or hurtful content – creates a kind of permanent public record of their views, activities, and behavior. This public record can be thought of as an online reputation, which may be accessible to schools, employers, colleges, clubs, and others who may be researching an individual now or in the future. Cyberbullying can harm the online reputations of everyone involved – not just the person being bullied, but those doing the bullying or participating in it. Cyberbullying has unique concerns in that it can be:

Persistent – Digital devices offer an ability to immediately and continuously communicate 24 hours a day, so it can be difficult for children experiencing cyberbullying to find relief.

Permanent – Most information communicated electronically is permanent and public, if not reported and removed. A negative online reputation, including for those who bully, can impact college admissions, employment, and other areas of life.

Hard to Notice – Because teachers and parents may not overhear or see cyberbullying taking place, it is harder to recognize.

Laws and Sanctions

All states have laws requiring schools to respond to bullying. As cyberbullying has become more prevalent with the use of technology, many states now include cyberbullying , or mention cyberbullying offenses, under these laws. Schools may take action either as required by law, or with local or school policies that allow them to discipline or take other action. Some states also have provisions to address bullying if it affects school performance. You can learn about the laws and policies in each state, including if they cover cyberbullying.

Frequency of Cyberbullying

There are two sources of federally collected data on youth bullying:

  • The 2019  School Crime Supplement  to the National Crime Victimization Survey (National Center for Education Statistics and Bureau of Justice) indicates that, nationwide, about 16 percent of students in grades 9–12 experienced cyberbullying.
  • The 2021 Youth Risk Behavior Surveillance System (Centers for Disease Control and Prevention) indicates that an estimated 15.9% of high school students were electronically bullied in the 12 months prior to the survey.

See also " Frequency of Bullying ."

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Curating Cyberbullying Datasets: a Human-AI Collaborative Approach

Christopher e. gomez.

Department of Computer Science, Northeastern Illinois University, 5500 N St. Louis Ave, Chicago, IL 60625 USA

Marcelo O. Sztainberg

Rachel e. trana, associated data.

The original YouTube dataset and the combined algorithmic/Amazon Mechanical Turk curated dataset are available upon request.

Code is available by request for reuse and modification as long as the original authors are referenced and the code is not used commercially.

Cyberbullying is the use of digital communication tools and spaces to inflict physical, mental, or emotional distress. This serious form of aggression is frequently targeted at, but not limited to, vulnerable populations. A common problem when creating machine learning models to identify cyberbullying is the availability of accurately annotated, reliable, relevant, and diverse datasets. Datasets intended to train models for cyberbullying detection are typically annotated by human participants, which can introduce the following issues: (1) annotator bias, (2) incorrect annotation due to language and cultural barriers, and (3) the inherent subjectivity of the task can naturally create multiple valid labels for a given comment. The result can be a potentially inadequate dataset with one or more of these overlapping issues. We propose two machine learning approaches to identify and filter unambiguous comments in a cyberbullying dataset of roughly 19,000 comments collected from YouTube that was initially annotated using Amazon Mechanical Turk (AMT). Using consensus filtering methods, comments were classified as unambiguous when an agreement occurred between the AMT workers’ majority label and the unanimous algorithmic filtering label. Comments identified as unambiguous were extracted and used to curate new datasets. We then used an artificial neural network to test for performance on these datasets. Compared to the original dataset, the classifier exhibits a large improvement in performance on modified versions of the dataset and can yield insight into the type of data that is consistently classified as bullying or non-bullying. This annotation approach can be expanded from cyberbullying datasets onto any classification corpus that has a similar complexity in scope.

Introduction

Cyberbullying, a term that first arose just before the year 2000, is a form of bullying enacted through an online space (Cyberbullying n.d. ; Englander et al.,  2017 ). It has become more prevalent, especially with the creation and increased use of social media applications such as Facebook, Twitter, Instagram, and YouTube (Kessel et al.,  2015 ; Hinduja & Patchin 2019a ; Patchin & Hinduja, 2019 ). Additional evidence suggests that cyberbullying has experienced an even more dramatic increase due to the recent Covid-19 pandemic, which caused children and teenagers, age groups most at risk of being victims of cyberbullying, to spend extended time on online applications (Gordon, 2020 ) for both academic and leisure activities. Victims of cyberbullying can exhibit both psychosocial health problems, such as depression, anxiety, and suicidal ideation, as well as psychosomatic disorders, such as headaches and fatigue (Giumenti & Kowalski,  2016 ; Hackett et al., 2019 ; Nixon, 2014 ; Vaillancourt et al., 2017 ). The inherent online and far-reaching nature of cyberbullying makes it difficult to detect and prevent, and as a result, many individuals are vulnerable to this form of abuse. This study seeks to address several challenges with cyberbullying identification by using machine learning algorithms to evaluate a recently labeled YouTube dataset composed of approximately 19,000 comments.

Cyberbullying Definitions and Identification

Companies, such as Twitter and Instagram, have been actively working to create algorithms that can be used to detect cyberbullying by flagging suspicious content in order to address, prevent, and minimize cyberbullying incidents. In 2019, Instagram rolled out a feature that issues a warning to a user if their comment is considered to be potentially offensive (Steinmetz, 2019 ). This allows the user to rethink whether they wish to continue posting the flagged content. Twitter also takes steps to limit harmful content by implementing a specific policy depending on the level of severity, such as limiting the visibility of a tweet or sending a direct message to a user who was reported (Our Range of Enforcement Options, 2020 ).

A common approach when defining cyberbullying is to combine characteristics of traditional bullying (intention, repetition, power imbalance) with devices used in cyberspace (computers, cell phones, etc.) (Englander et al., 2017 ). Hinduja and Patchin define cyberbullying as “willful and repeated harm inflicted through the use of computers, cell phones, and other electronic devices” (Hinduja & Patchin, 2015 p. 5, Hinduja & Patchin 2019b ). However, using the traditional criteria of repetition and power imbalance to define cyberbullying has been a source of debate among researchers (Smith et al.,  2013 ). The delineation between a single occurrence and repetition can be unclear, since a single online action can be amplified and forwarded by multiple other participants to a larger general audience. Studies on young adult and adolescent definitions of bullying are inconsistent in terms of repetition, with some studies indicating that a single instance is sufficient or that repetition is irrelevant, when identifying cyberbullying (Menesini et al., 2012 ; Walker, 2014 ), and other studies reporting that repetition is a clear component of a cyberbullying definition (Höher et al., 2014 ; Nocentini et al., 2010 ; Vandebosch & Van Cleemput, 2008 ). The inclusion of repetition in adult definitions of cyberbullying in work environments is also contested, with studies suggesting that context (public vs private communications) determines whether repetition is a required component of a cyberbullying definition (Langos, 2012 ; Vranjes et al., 2017 ) and that victims could themselves further promote a form of repetition by revisiting online bullying communications, thus becoming quasi-perpetrators (D'Cruz & Noronha, 2018 ). The criterion of power imbalance is similarly disputed by researchers as to its importance in the definition of cyberbullying. Multiple studies suggest that the power balance is not considered important in the definition of cyberbullying since the concept of a power imbalance is difficult to identify in a virtual space compared to a traditional bullying setting where a bully has superior strength or there are a large number of bullies (Dredge et al.,  2014 ; Höher et al., 2014 ; Nocentini et al., 2010 ). Other studies state that the inherent nature of an online environment, and specifically the anonymity, contribute to the power balance by enabling perpetrators to boldly attack targets with minimal repercussions (Hinduja & Patchin, 2015 ; Menesini et al., 2012 ; Peter & Petermann, 2018 ; Suler, 2004 ).

The challenges with reaching a consensus on a common definition of cyberbullying, even among subject matter experts, impact the labeling of cyberbullying datasets and subsequently the algorithms and models derived from this data. Cyberbullying datasets are frequently labeled by human participants who may have little formal training or context on cyberbullying and, given the lack of a clear definition of cyberbullying, rely on their individual perspectives, cultural context and understandings, and personal biases when annotating data.

Annotation of Existing Cyberbullying Datasets

Using human participants to annotate data is a common practice in situations where the label cannot be obtained innately through the data. Researchers frequently have an odd number of participants determine whether content is considered bullying or non-bullying and assign a final label based on the majority vote (Rosa et al., 2019 ). For example, Reynolds et al. ( 2011 ) recruited three workers and stated the reason for doing so was due to the subjectivity of the task, and that the wisdom of three workers provided confidence in the labeling. However, the subjectiveness of the content does not necessarily produce a unanimous agreement among workers’ labels, thus creating annotations that are themselves uncertain. Many frequently referenced cyberbullying datasets have been evaluated and labeled using an odd number of human participants. Dadvar et al. ( 2012 ) had three students label 2200 posts from Myspace, a social networking service, as harassing or non-harassing. Chatzakou et al. ( 2017 ) recruited 834 workers from CrowdFlower, a crowdsourcing site specifically made for machine learning and data science tasks, to label a Twitter dataset where they had five workers per task and, to eliminate bias, workers were only used once per task. Hosseinmardi et al. ( 2015 ) created a dataset using Instagram, a photo- and video-sharing social networking site, where they had five workers determine if a media session (media object/image and comments) was an instance of cyberaggression (using digital media to intentionally harm another person) and cyberbullying (a form of cyberagression that is intentional, repeated, and carried out through a digital medium against a person who cannot easily defend themselves). A dataset collected from Formspring, a question-and-answer site, was originally curated using Amazon Mechanical Turk (MTurk), an online marketplace for human-related tasks, where three workers were tasked with labeling each question and answer as being bullying or not. They were also asked to rate the post on a scale of no bullying (0) to severe (10) and to select, if any, words or phrases that indicate bullying and add additional comments (Reynolds et al., 2011 ).

Many studies use MTurk for labeling purposes given its low cost and ease of use in textual cyberbullying identification. However, the use of MTurk introduces additional labeling concerns, such as the training level of MTurk workers. Wais et al. ( 2010 ) had MTurk workers annotate over 100,000 expert-verified business listings and found that most workers do not produce adequate work. The authors found that workers performed poorly on what they considered simple verification tasks, and they hypothesized that this is because the workers “find the tasks boring and “cruise” through them as quickly as possible” (Wais et al., 2010 ). It is therefore necessary to recruit highly trained and rated workers to annotate content for cyberbullying.

Issues with labeled data using MTurk workers have also been identified in other cyberbullying datasets. An analysis on the dataset collected from Formspring found many cases where the labels were incorrectly annotated (Ptazynski et al.,  2018 ). In a recent survey, Rosa et al. ( 2019 ) found that only 5 out of 22 cyberbullying studies provided sufficient information on the labeling instructions provided to human participants to annotate the data. The remaining 17 studies were ambiguous when providing details to annotators for labeling purposes or when determining whether annotators were experts in the domain of cyberbullying. From the five studies (Bayzick et al., 2011 ; Hosseinmardi et al., 2015 ; Ptaszynski et al., 2018 ; Sugandhi et al., 2016 ; Van Hee et al., 2015 ) that provided some instruction, the annotators were given definitions of cyberbullying and/or given context to the content they were labeling. Rosa et al. ( 2019 ) also found that annotators for cyberbullying datasets, when available, were frequently students or random individuals on MTurk without specific qualifications. This suggests that while human participants are frequently employed to label cyberbullying datasets, the potential lack of qualifications or sufficient instructions can introduce bias and uncertainty into the associated labels.

Participants also have their own set of biases, cultural influences, and personal experiences that determine how they perceive specific content (Allison et al.,  2016 ; Baldasare et al., 2012 ; Dadvar et al., 2013 ). Unlike sentiment analysis, which revolves around the general sentiment of content (i.e., “I didn’t really like that movie”), cyberbullying is a direct attack on a person, or persons, that often requires situational context in order to be properly understood. As a result, an individual comment taken out of context can be interpreted in multiple ways. Furthermore, since workers perform their tasks remotely, it is challenging to verify whether the worker completing the task is human or a bot, thus potentially broadening the problem’s complexity (Ahler et al., 2019 ; Kennedy et al., 2020 ). The combination of these issues makes it uniquely challenging to collect a reliably annotated dataset for the purpose of developing machine learning models to identify cyberbullying.

Algorithmic Curation of Other Datasets

As mentioned previously, a concern with using human participants to label cyberbullying datasets is that humans can introduce errors (Lin et al.,  2014 ). To manage this problem, an identification and re-annotation process for labeled data can be implemented when at least 75% of the human-based annotations are accurate (Lin et al., 2014 ). One method to manage problematic data is to identify mislabeled data that negatively affects the performance of machine learning algorithms. Brodley and Friedl ( 1999 ) focused on the identification and elimination of mislabeled data that occurs because of “subjectivity, data-entry error, or inadequacy of the information used to label each object.” They implemented a set of filtering methods, referred to as majority vote and consensus filtering, to identify mislabeled data on five datasets. To achieve this, they used a set of three base-level classifiers in each of the two filtering methods. To consider a label as mislabeled, the majority vote filtering method required that only a majority number of the classifiers disagree with the original label. The consensus filtering method approach required that all of the classifiers disagree with the original label. Of these two approaches, they found that the majority vote method produced the best results. A limitation of this approach is that as noisy data increased within a dataset, it became less likely that the filtering methods would work (Brodley & Friedl, 1999 ). Guan et al. ( 2011 ) expanded on these filtering methods with “majority (and consensus) filtering with the aid of unlabeled data” (MFAUD and CFAUD). These proposed methods introduced a novel technique of using unlabeled data to aid in the identification of mislabeled data. The authors noted that the combination of using labeled data and unlabeled data is a semi-supervised learning method, as opposed to an unsupervised learning approach. However, the focus of the method is to identify mislabeled data as opposed to training a better classifier. The unlabeled data is labeled through the use of a classifier that is trained on a portion of labeled data. This then enlarges the original dataset, which can be used to further identify mislabeled data. The limitation of this technique is that it can be difficult to determine with a strong degree of confidence that the unlabeled data was correctly labeled by the classifier.

A more recent study by Müller and Markert ( 2019 ) introduced a pipeline that can identify mislabeled data in numerical, image, and natural language datasets. The efficacy of their pipeline was evaluated by introducing noisy data, or data that was intentionally changed to be different from its original label, in an amount of 1%, 2%, or 3%, into 29 well-known real-world and synthetic classification datasets. They then manually determined whether the flagged data was indeed mislabeled. Ekambaram et al. ( 2017 ) used support vector machine and random forest algorithms to detect mislabeled data in class pairs (for example, alligator vs crocodile) in a dataset known as ImageNet, which is composed of images and has 18 classes. Using a combination of both algorithms, they were able to detect 92 mislabeled examples, which were then subsequently confirmed as having been mislabeled by human participants. Samami et al. ( 2020 ) introduced a novel method that tackled weaknesses in the majority filtering and consensus filtering approaches. They found that consensus filtering often misses noisy data because of its strict rules that require the agreement of all base algorithms to find mislabeled data, whereas majority filtering is more successful in identifying and eliminating mislabeled data, but can also eliminate correctly labeled data. To address these issues, they proposed a High Agreement Voting Filtering (HAVF) using a mixed strategy, which “removes strong and semi-strong noisy samples and relabels weak noisy instances” (Samami et al., 2020 ). The authors applied this method on 16 real-world binary classification datasets and found that the HAVF method outperformed other filtering methods on the majority of datasets.

Using machine learning–based majority voting or consensus filtering methods has been applied extensively in prior research for classification datasets focused on topics such as finance, medical diagnosis, and news media (Brodley & Friedl, 1999 ; Ekambaram et al., 2017 ; Guan et al., 2011 ; Müller & Markert, 2019 ; Samami et al., 2020 ). However, to the best of our knowledge, these methods have not yet been applied to cyberbullying datasets. Furthermore, the purpose of this study is similar to that of many of these studies, which is to find and discard mislabeled data. This can be thought of as identifying instances of cyberbullying and non-cyberbullying that most individuals will classify as belonging to those classes. In this study, we propose two filtering approaches, referred to as Single-Algorithm Consensus Filtering and Multi-Algorithm Consensus Filtering, to curate a cyberbullying dataset. Considering the difficulty with establishing a definition of cyberbullying, even among experts, and the challenges present when using human participants to label cyberbullying data, the goal of this research is to use machine learning–based filtering approaches in collaboration with human annotators to evaluate an MTurk-labeled YouTube dataset composed of approximately 19,000 comments to (1) refine a cyberbullying dataset with unambiguous instances of cyberbullying and non-bullying comments and to (2) investigate whether an independent machine learning model is more performant on the curated datasets. For the purpose of this study, we define an unambiguous instance as an instance where there is an accord between the majority decision of the annotator labels and the label generated when the AI filtering models are in unanimous agreement.

Data Collection

To provide a current corpus for classification of cyberbullying text, we collected approximately 19,000 comments that were extracted using the YouTube API between October 2019 and January 2020. Using the API, the information extracted was (1) the date the comment was made, (2) the id of the video associated with the comment, (3) the author of the video associated with the comment, (4) the author of the comment, (5) the number of likes for the comment, and (6) the comment itself. However, only the comments were used for analysis. This general corpus consists of topics that are inherently controversial in nature, such as politics, religion, gender, race, and sexual orientation, and are geared toward teenagers and adults. This data was manually labeled as bullying/non-bullying using MTurk by providing batches of comments of varying sizes to MTurk workers, as well as a definition of bullying and a warning that foul language could be found in the comments. The definition we provided was as follows:

Is the text bullying? Bullying can be described as content that is harmful, negative or humiliating. Furthermore, the person reading the text could be between the ages of 12-19 and/or may have a mental health condition such as anxiety, depression, etc.

Given this information, they could choose to accept or reject the classification task. Three MTurk workers classified each comment in the corpus as bullying or non-bullying, where the majority classification decided the final label. The complete dataset contained 6462 bullying comments and 12,314 non-bullying comments, leading to a 34.4% bullying incidence rate, consistent with the description of a good dataset that has at least 10% to 20% bullying instances (Salawu et al., 2017 ).

Preprocessing

We preprocessed the collected comments using various preprocessing methods: lowercased all text; expanded contractions; removed any punctuation; eliminated stop words; reduced redundant letters (maximum of 2 consecutive letters); and removed empty comments. The preprocessing methods were all created using custom algorithms by first tokenizing the comments into word tokens and applying the appropriate algorithm if specific conditions were met. For example, contractions were expanded when a token matched a set of predefined contractions then expanding that token (i.e., “aren’t” becomes “are not”) and the reduction of letters occurred when a token contained more than 2 consecutive letters (i.e., “cooool” becomes “cool”). We also corrected misspellings through the use of the Symmetric Delete Spelling Correction algorithm (SymSpell) (Garbe, 2020 ). Misspellings can be indicative of slang terminology that can represent bullying intent; however, for the purposes of this study, we did not include a slang/sentiment analysis. Finally, we lemmatized the text using spaCy, a natural language processing library. For specific comments (such as “I am” or “I see”), removing the stop words produced an empty comment, which we then eliminated from the dataset. After preprocessing, a final dataset of 18,735 comments remained, with 34% labeled as bullying.

Feature Extraction

In the development of machine learning models, there are features that are extracted from datasets and used to train a model. These features are different depending on the nature of the dataset and the problem to be solved. For the purpose of our research, the features are the words found in the YouTube comments. To extract features from our dataset, we implemented two different approaches depending on the classification algorithm used: Bag of Words (BoW) and Word Embeddings. A popular method to develop a word embedding model is known as Word2Vec (Mikolov et al., 2013 ), which requires a large corpus of text data to be properly trained. Given our small dataset, we opted to use a pre-trained Word2Vec model based on GoogleNews for our experimentation (Word2Vec, 2013 ). We applied the BoW approach to the naive Bayes, support vector machine, and artificial neural network algorithms, and word embeddings were applied to the convolutional neural network algorithm.

Model Creation

We implemented four different machine learning algorithms: naive Bayes (NB), support vector machine (SVM), a convolutional neural network (CNN), and a feed-forward multilayer perceptron, a class of artificial neural network that we refer to as ANN for simplicity. NB is a probabilistic classifier, based on Bayes’ theorem, that produces a probability of a comment being bullying based on the occurrence of words in the comments (Nandhini & Sheeba, 2015 ). The implementation of naive Bayes is known as multinomial naive Bayes, which is appropriate for word counts, such as the use of BoW described previously. This implementation of naive Bayes has a smoothing parameter termed alpha that is used to address instances of zero probability. We left this parameter at its default setting of 1 to allow some probability for all words in each prediction. The SVM algorithm utilizes a kernel function to separate classes in a higher-level dimension if the default dimension of the data is not linearly separable. Furthermore, SVM accomplishes this through the use of hyperplanes and vectors which separate classes (bullying or non-bullying) based on the nearest training data points (i.e., comments least likely to be considered bullying or non-bullying) (Dinakar et al., 2012 ). By focusing on the nearest data points for each class, SVM can find the most optimal decision boundary to separate the data. The CNN and ANN algorithms are both neural networks, or interconnected networks of nodes that mimic a biological neural network arranged in layers (Géron, 2019 ; Minaee et al., 2020 ). The ANN used in this research is a deep neural network where the first layer is an embedding layer and the output layer is binary (bullying or non-bullying) with middle layers known as the hidden layers. The ANN model included two hidden layers, with the first of these hidden layers composed of 15 neurons and the second layer composed of 10 neurons. The neurons within the hidden layer utilized the Rectifier Linear Unit (ReLU) activation function, and the output layer was based on the sigmoid activation function. A CNN relies on filters that traverse through comments via word groupings (2 words, 3 words, and 4 words) to identify key pieces of information that can aid in text classification. The ability of a CNN to group words together makes it unique in its implementation compared to the other algorithms used in this study. The CNN connects to an artificial neural network, which uses the same parameters as the ANN described previously.

The NB, SVM, and CNN algorithms were used in the consensus filtering methods to evaluate the YouTube dataset and identify unambiguously labeled comments (i.e., all algorithm predictions are in agreement with the majority decision of the annotators’ labels), and the ANN was used as an independent model solely for the purpose of performance evaluation of the curated versions of the dataset. Each algorithm was chosen due to their successes in general text classification tasks (Kowasari et al.,  2019 ; Minaee et al., 2020 ), as well as text classification related to cyberbullying (Rosa et al., 2018 ). For all instances of supervised learning, we used an 80–20 split, where 80% of the data was used for training and 20% was used for evaluating the trained models. Python’s scikit-learn library (Pedregosa et al., 2011 ) was used to split the dataset into training and test sets, the BoW process, and the implementation of NB and SVM. Both neural networks (ANN and CNN) were implemented using TensorFlow (Abadi et al., 2016 ). The performance of the ANN on modified datasets was evaluated using two scoring metrics: accuracy and F -score. In nearly-balanced datasets, accuracy provides the number of correct predictions from the total number of samples. Recall and precision are typically observed together, where recall is used to identify which comments were correctly predicted for each class, and precision identifies the percentage of predicted comments that actually belonged to their respective class. Instead of using recall and precision individually, we use F -score, the harmonic mean of recall and precision, which provides a more appropriate measure of the incorrectly classified cases in an unbalanced setting. More specifically, we use a macro F -score which is a preferable metric when working with an imbalanced class distribution because it applies the same weight to each class, regardless of the number of instances in each class. By considering both accuracy and F -score, we are able to provide a more rounded assessment of the overall efficacy of the models.

Modified Dataset Creation

The goal of this research is to investigate and create a curated version of the YouTube dataset, with the purpose of being able to better understand and identify instances of bullying and non-bullying comments in a large varied dataset by investigating those instances where the consensus filtering models are themselves in unanimous agreement with the majority decision of the annotators’ labels, referred to as unambiguous instances. To do this, we implemented two filtering algorithms, which we refer to as Single-Algorithm Consensus Filtering (SACF) and Multi-Algorithm Consensus Filtering (MACF). A consensus refers to a unanimous labeling result following the application of a filtering method to a given comment.

For both CF methods, the dataset was divided into subsets of unique comments, with one subset reserved for testing and the remaining subsets used for training. The SACF method, which uses a single algorithm (SVM), has 10 subsets where one is reserved for testing and eight of the remaining nine are rotated for training a model because the training process requires eight subsets to retain an 80–20 train-test split. As a result, one subset is ignored during evaluation. To ensure that the test set is properly evaluated against every possible model, we used cross-validation and trained the subsets through nine iterations, rotating out a different subset to ignore in each iteration, resulting in nine predictions per comment. Following the repeated evaluations on each test set, we analyzed the classification generated by the model for all the comments in that test set. If the consensus after nine evaluations matches the MTurk majority annotation for a comment, we reserve that comment to be used for future analysis.

The MACF method relies on three algorithms: NB, SVM, and CNN. Instead of 10 subsets, there are five subsets where one is reserved for testing and the remaining four are used for training a model. Unlike the SACF approach, there is no rotation of subsets in the training set since the four remaining subsets meet the 80–20 train-test split requirements and we use the consensus label of the three algorithms (NB, SVM, and CNN) for evaluation. If the predicted label of all three algorithms match the MTurk label, then the comment is reserved for future analysis. This approach, which is significantly different from the SACF approach, allows for the possibility to filter comments in a manner that the single algorithm may not as it does not rely on the decision of a single algorithm to identify correctly labeled comments.

Resulting Modified Datasets

A modified version of the dataset was created by each of the filtering methods based on their results. We refer to these new versions as SA-CDS and MA-CDS (where CDS represents the terminology C urated D ata S et). These datasets were constructed by identifying comments where the filtering methods unanimously agreed with the MTurk label. The two modified versions of the dataset were assembled based on the results from the consensus filtering methods [Single-Algorithm Consensus Filtering (SACF), Multi-Algorithm Consensus Filtering (MACF)] and the MTurk verification process as follows: (1) a version that contains the comments resulting from the consensus agreement between the MTurk label and the SACF method (SA-CDS) and (2) a version that contains the results from the consensus agreement between the MTurk label and the MACF method (MA-CDS).

Modified Dataset Evaluation

An artificial neural network (ANN) was implemented to test the performance on all datasets. The ANN algorithm was used so as to remove possible algorithmic bias that would occur from using one of the algorithms included in the SA or MA consensus filtering methods. In order to conduct a fair experiment, all datasets were split with an 80–20 split for a training set and test set, respectively. Each filtering method produced a dataset with different total numbers of comments, and the bullying to non-bullying ratio was different for each dataset within the training and test sets due to the uniqueness of the filtering methods’ annotation processes.

Consensus Filtering Label Agreement Results

We hypothesized that the filtering methods, in many instances, will unanimously agree with the majority decision of the MTurk labels, which we refer to as unambiguous instances of non-bullying or bullying. With the SACF method, a unanimous agreement with the MTurk label is described as the case where the algorithm predicted the same label for a comment for all nine iterations and also matched the MTurk label. A unanimous agreement for the MACF method is described as the case where each of the three algorithms predicted the same label as that of the MTurk workers’ label. In our analysis, we found that from the 18,735 comments in the YouTube dataset, the SACF method produced labels that unanimously agreed with the MTurk label in 9489 comments and the MACF method produced labels that unanimously agreed with the MTurk label in 9679 comments. We also investigated when the filtering methods produce a predictive label that unanimously disagreed with the MTurk label, although these results are not used as part of the modified datasets. Similar to the unanimous result analysis, the number of instances of unanimous disagreement for each filtering method were roughly equivalent, with more occurrences of bullying than non-bullying. The SACF unanimously disagreed with the MTurk labels of 3365 comments, of which 1060 were originally instances of non-bullying comments and 2305 were originally instances of bullying comments. The MACF method unanimously disagreed with the MTurk labels of 3377 comments, where 1175 originally belonged to the non-bullying class and 2202 originally belonged to the bullying class.

Given that the goal of both filtering methods is to identify unambiguous comments, we analyzed the number of comments that were commonly identified by both filtering methods as correctly labeled. In total, there were 8017 comments in common where both filtering methods unanimously agreed with the MTurk label (6722 bullying and 1295 non-bullying) and 2324 comments in common where both filtering methods unanimously disagreed with the MTurk label (1689 bullying and 635 non-bullying). Further analysis into the unanimously agreed 8017 comments showed that the MTurk workers themselves completely agreed on the label on 4203 of these comments, with 564 belonging to the bullying class and 3639 belonging to the non-bullying class. Similarly, we investigated the 7578 comments that were removed during the filtering process. Of these comments, the MTurk workers all agreed on the same label for 2340 of the comments, with 1128 identified as bullying and 1212 identified as non-bullying.

Modified Dataset Descriptions and Distributions

The overarching goal of this study is to explore those instances of bullying and non-bullying comments identified as unambiguous using the consensus filtering methods and the MTurk labels, and to create a revised version of the dataset which can be subsequently used to develop a machine learning model that can more accurately predict instances of cyberbullying or non-bullying. As described in the “ Methods ” section, two modified datasets were curated. We refer to these two datasets as SA-CDS and MA-CDS , respectively. With these new datasets, we also display MTurk labeling information from the original YouTube dataset, simply termed YouTube . Given that each filtering method is different in its implementation, we hypothesized that each approach would produce different-sized datasets with a different ratio of bullying and non-bullying comments. The original dataset contained 18,735 comments with 6462 labeled as bullying and 12,273 labeled as non-bullying. The curated datasets that were formed from the filtering methods, SA-CDS and MA-CDS, contained similar numbers of comments, with SA-CDS containing 9489 comments and MA-CDS containing 9679 comments (see Table ​ Table1). 1 ). The bullying to non-bullying ratio was approximately equal with SA-CDS containing 1721 bullying comments and 7768 non-bullying comments (~ 1:4.5 ratio), and the MA-CDS containing 1835 bullying comments and 7844 (~ 1:4.3 ratio) non-bullying comments (see Table ​ Table1). 1 ). While the bullying to non-bullying ratios of these modified datasets are notably smaller than the ~ 1:2.9 bullying to non-bullying ratio of the YouTube dataset, they still adhere to the description of a good dataset that has at least 10% to 20% bullying instances (Salawu et al., 2017 ).

Analysis of the non-bullying and bullying comment ratio that resulted from the unanimous agreement of the filtering methods with the MTurk label

646217211845
12,27377687840
18,73594899685

Artificial Neural Network Evaluation on Modified Datasets

An ANN was used to test for classification performance on all datasets, using accuracy and F -score as performance metrics (see Table ​ Table2). 2 ). To establish a baseline, we measured the performance of the ANN on the YouTube dataset and found that it had a classification accuracy of 67% and an F -score of 63%. We then used the ANN to measure the classification performance on the modified datasets. The ANN performed similarly on the SA-CDS and MA-CDS datasets with an accuracy of 96% on both and an F -score of 93% and 92%, respectively. The two modified datasets demonstrated a strong increase in performance compared to the original YouTube dataset (see Table ​ Table2 2 and Fig.  1 ), with a 28% increase in accuracy and a range of 28%–30% increase in F -score.

Accuracy and F -score on all datasets using an artificial neural network (ANN)

Accuracy67%96%96%
-score63%93%92%

An external file that holds a picture, illustration, etc.
Object name is 42380_2021_114_Fig1_HTML.jpg

Accuracy and F -score for all datasets using an artificial neural network (ANN)

The objectives of this research were twofold: (1) to apply a collaborative approach of human labeling with consensus filtering methods to refine a cyberbullying dataset with unambiguous instances of cyberbullying and bullying comments and (2) to investigate whether an independent machine learning model is more performant on the curated datasets. A curated dataset based on unambiguous instances of cyberbullying can be used to develop more performant cyberbullying detection models, which could be subsequently used to initially distinguish between clear instances of cyberbullying and those cases that may require further analysis or more specific language models. The filtering methods, SACF and MACF, identified roughly the same percentage of both bullying and non-bullying comments, resulting in datasets that when tested with a completely independent algorithm, produced similar classification performance with high accuracy. This suggests that both filtering methods can be used to curate datasets that can be used to develop more performant models. When viewing the commonality of the filtering methods, both approaches unanimously agreed with the MTurk label on 8017 comments (83% of the total comments identified by SACF and 84% of the total comments identified by MACF). Given the differences in the filtering method algorithms and implementations, the high percentage of commonality indicates that both approaches can be used with confidence to curate datasets that can be used to create more performant models. The SACF method relies on one algorithm, SVM, and this may be sufficient to identify unambiguous instances. However, the unique approach of MACF is that each algorithm used has a distinct implementation from the others, which has the potential to mimic how individuals from different backgrounds could view a given comment with dissimilar perspectives. In this approach, if the algorithms’ predictions are in unanimous agreement on a comment’s label, then it may be a strong indication that the annotator label is correct.

Our analysis also found that of the 8017 comments where both filtering methods agreed on the label, a little over half (52%) of these comments had labels where all three MTurk workers were also in complete agreement on the annotation. In contrast, of the 7578 comments removed during the filtering process, there were only 2340 comments (31%) where the MTurk workers completely agreed on the designated annotation. This shows that in cases where both filtering methods identified a comment as unambiguous, there is a higher likelihood that those comments also had complete agreement among the MTurk workers’ annotations. This has implications for further research into models developed on an even more refined version of the curated datasets containing only those comments that are identified as unambiguous by the filtering methods and where the MTurk workers’ annotations are in complete agreement.

When using the ANN to develop a model on the modified datasets, we found that it had a slightly higher increase in performance on the SA-CDS. While the accuracy of the ANN on SA-CDS and MA-CDS was 96%, the test sets used to attain this metric from the datasets were imbalanced; therefore, the F -score is a better measure (see Table ​ Table2 2 and Fig.  1 ). The F -score of the ANN on SA-CDS was 93%, whereas the F -score from MA-CDS was only slightly lower than that at 92% (see Table ​ Table2 2 and Fig.  1 ). This similarity in performance is unsurprising given that about 83% of each dataset has comments in common with the other, suggesting that their differences are negligible. It may be possible that using a combination of different algorithms with the MACF approach may produce superior results to the ones described in this study. For example, a combination of neural network–based algorithms may be preferable, such as a word embedding CNN (identical to the one used in this work), a character embedding CNN, and a recurrent neural network (Minaee et al., 2020 ), which have all shown success in text classification. Another option could be to use an ANN, similar to the one presented in this study, but with different word representation methods (i.e., BoW, character embeddings, word embeddings, etc.) and different lengths in word sequences (i.e., different n-grams). Using different word representations provides diverse perspectives that can help in identifying bullying content from different viewpoints, which, in the case of unanimous agreement, indicates confidence in that label.

Although the goal of this research is to refine a cyberbullying dataset with unambiguous instances of bullying and non-bullying comments and to filter out those comments with potentially uncertain labels, there were instances of comments where the filtering methods unanimously disagreed with the MTurk label. Interestingly, the number of comments where the filtering methods unanimously disagreed with MTurk labels was similar (SACF: 3365, MACF: 3377), even when considering the ratio of bullying to non-bullying comments (SACF: ~ 2.17:1, MACF: ~ 1.87:1). However, unlike the case with unanimous agreement, when viewing the commonality of the filtering methods, both approaches unanimously disagreed with the MTurk label on 2322 comments. This is approximately 69% of the comments that unanimously disagreed with the MTurk labels identified by each filtering approach separately. This class of comments are worth discussing because if the proposed filtering methods are to rely on consensus as a way of identifying unambiguous instances of bullying and non-bullying comments, then this set of identified comments have consistently disputed the MTurk label and further investigation is needed to more fully understand what is unique about this sub-dataset and what properties cause it to be consistently mislabeled in both filtering approaches. One possibility is that this subset, or some portion of this subset, has been incorrectly labeled by MTurk and the filtering methods have predicted the correct labels. Additionally unusual is that the majority of these comments did not belong to the non-bullying class, which is the dominant class, but rather to the bullying class.

Limitations

A limitation of this study is that while an ANN was used instead of one of the algorithms from the filtering methods to remove algorithmic bias that may occur due to using an algorithm that also created the dataset, an algorithm that is uniquely different could have been used to further distance the implementation of the performance algorithm from those of the filtering algorithms. The ANN utilized a BoW approach to represent words, which was also used with naive Bayes and SVM during the filtering process. An n-gram approach with the ANN could have been used with word embeddings alone, thus incorporating an implementation that is not as directly related to that used in the filtering algorithms. As an alternative, an algorithm such as random forest, which uses majority voting to decide on a classification, could also be used in place of the ANN given that its implementation is substantially different from NB and SVM.

A second limitation of this study centers around the size of the dataset. While the content of the dataset is relatively current (late 2019), the size of the dataset was limited (approximately 19,000 comments) and machine learning model performance is dependent on the size of the dataset used to train the model. A larger dataset has the benefit of including a more diverse vocabulary compared to the one developed through the dataset in this study, especially when using a BoW text representation. Continuously expanding the vocabulary to encompass relevant cultural and societal terminology is essential to address the evolving character of cyberbullying, and could produce models with greater relabeling accuracy compared to the models developed in this research.

A third limitation is that the instances where the filtering methods unanimously disagreed with the MTurk label may require further analysis. We reported and briefly discussed those results in this study, but we did not do any testing against this subset which could provide further insight into what these instances represent, and whether a majority may simply be comments that are incorrectly labeled.

A fourth limitation of this study focuses on the process. At this point, our strategy relies on a comparison of the algorithmically agreed results to the original human consensus annotations. While this reduces the size of the generated dataset and makes it highly sensitive to the original group of annotators, this allows for a deeper understanding and analysis of the type of original data that is consistently classified as bullying or non-bullying. As we improve our understanding of cyberbullying datasets and classification outcomes, a future goal is to remove this dependency on the original annotations while maintaining model accuracy.

Finally, the MTurk label used was based on the majority vote of the three MTurk workers, where if at least two workers labeled a comment as bullying then the final label is bullying. Our filtering methods depend on a unanimous agreement among the iterations of the single algorithm or the three algorithms in the multi-algorithmic approach. Different results may have been produced if we only considered instances where there is a unanimous agreement between the algorithms, as well as between all three workers. This is something that should be investigated further, because it may filter out additional uncertainty, which will result in a superior dataset for unambiguous cyberbullying detection.

Future Work and Implications

We have shown that using machine learning algorithms, as part of single or multiple filtering approaches, to evaluate a YouTube dataset allowed us to (1) curate modified versions of the dataset with a focus on bullying and non-bullying comments identified as unambiguous while still adhering to the definition of a good cyberbullying dataset, and to (2) create more performant classification models from those datasets, while also gaining insight into the type of data that is consistently classified as bullying or non-bullying. Datasets used to detect cyberbullying using machine learning can contain uncertain data, and this process of creating modified datasets using filtering methods can prove useful as an initial attempt at separating those data that are clear cases of bullying and non-bullying from those that are uncertain and may require further context or expert analysis as part of the identification process.

Given that most online interactions do not occur in a vacuum, a possible enhancement in cyberbullying detection is to incorporate all the elements and context of a comment in a dataset. Does the comment include an image or is embedded on it? How do emojis and emoticons impact the direction of the comment? Are there any slang words that could be interpreted in a different way? All of these, individually or in combination, could help improve the accuracy of our algorithms and also limit biases from MTurk or any other human curators. Lastly, we could look at creating hybrid strategies that combine supervised and unsupervised learning methods (Dinakar et al., 2012 ; Trana et al., 2020 ) that would allow for the creation of feedback loops and more adaptability of our algorithms without having to retrain them as the datasets grow and evolve.

Another possible application of these strategies is to detect clear cases of cyberbullying in real time. Currently, social media and other online outlets use proprietary machine learning algorithms to flag potentially offensive comments as detailed in their Terms of Service. On some platforms, such as Twitter, users can implement settings where they can review all flagged content first, or choose to have it automatically blocked. This two-step process is similar to what we presented and has the potential to be improved by increasing the accuracy of the methods used to select apparent occurrences of cyberbullying. In a similar manner, we can adapt our strategies to detect accurately labeled comments in domains like politics, science, social issues, and others. The results of this research study suggest an algorithmic framework to formally analyze and initially assess cyberbullying datasets. While human participants are still needed to provide a foundation for annotation, the use of multiple algorithms provides a scaffolding structure that could eventually incorporate unsupervised models that have been trained to recognize cultural colloquialisms and contemporary slang terminology, as well as context, thus addressing the inherent subjectivity of using human annotators. Additionally, the ability to make use of algorithms to dynamically recognize and identify new harmful or malicious content can further reduce the financial obligation required for recruiting human participants to create large-scale comprehensive datasets, thus creating new pathways and opportunities for research on preventing cyberbullying, with an ultimate goal of creating safer online spaces. It is important to note that the goals of these strategies are not to completely replace human decision-making and outperform experts, or to use AI-based methods to police online domains, but rather to help develop clear definitions surrounding harmful commentary and to help recognize human error and bias in data.

Acknowledgements

This work was supported by the U.S. Department of Education Title III Award #P031C160209 and the Northeastern Illinois University COR grant (2019–2020). We would also like to thank Dr. Rachel Adler, Amanda Bowers, Akshit Gupta, Sebin Puthenthara Suresh, Luis Rosales, Joanna Vaklin, and Ishita Verma for their participation in this research project.

Author Contribution

All of the authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Christopher Gomez. The first draft of the manuscript was written by Christopher Gomez, and all of the authors commented on subsequent versions of the manuscript. All of the authors read and approved the final manuscript.

This work was supported by the U.S. Department of Education Title III Award #P031C160209 and the Northeastern Illinois University COR grant (2019–2020).

Availability of Data and Material

Code availability, declarations.

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the Helsinki Declaration of 1975 and its later amendments or comparable ethical standards. The study was approved by the Ethics Committee of Northeastern Illinois University (No. 19–060).

Informed consent was obtained from all individual participants included in the study (No. 19–060).

Not applicable.

The authors declare no competing interests.

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation,  16, 265–283.
  • Ahler, D. J., Roush, C. E., & Sood, G. (2019). The micro-task market for lemons: Data quality on Amazon’s Mechanical Turk. In Meeting of the Midwest Political Science Association .
  • Allison KR, Bussey K. Cyber-bystanding in context: A review of the literature on witnesses’ responses to cyberbullying. Children and Youth Services Review. 2016; 65 :183–194. doi: 10.1016/j.childyouth.2016.03.026. [ CrossRef ] [ Google Scholar ]
  • Baldasare, A., Bauman, S., Goldman, L., & Robie, A. (2012). Cyberbullying? Voices of college students. In Misbehavior online in higher education . Emerald Group Publishing Limited.
  • Bayzick, J., Kontostathis, A., & Edwards, L. (2011). Detecting the presence of cyberbullying using computer software.
  • Brodley CE, Friedl MA. Identifying mislabeled training data. Journal of Artificial Intelligence Research. 1999; 11 :131–167. doi: 10.1613/jair.606. [ CrossRef ] [ Google Scholar ]
  • Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., & Vakali, A. (2017). Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science conference , 13–22.
  • Cyberbullying (n.d.). In Merriam-Webster’s online dictionary. Available at: http://www.merriam-webster.com/dictionary/cyberbullying . Accessed May 19, 2021.
  • Dadvar, M., Jong, F. D., Ordelman, R., & Trieschnigg, D. (2012). Improved cyberbullying detection using gender information. In Proceedings of the Twelfth Dutch-Belgian Information Retrieval Workshop (DIR 2012). University of Ghent.
  • Dadvar M, Trieschnigg D, Ordelman R, de Jong F. European Conference on Information Retrieval. Springer; 2013. Improving cyberbullying detection with user context; pp. 693–606. [ Google Scholar ]
  • D’Cruz P, Noronha E. Abuse on online labour markets: Targets’ coping, power and control. Qualitative Research in Organizations and Management. 2018; 13 (1):53–78. doi: 10.1108/QROM-10-2016-1426. [ CrossRef ] [ Google Scholar ]
  • Dinakar K, Jones B, Havasi C, Lieberman H, Picard R. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS) 2012; 2 (3):1–30. doi: 10.1145/2362394.2362400. [ CrossRef ] [ Google Scholar ]
  • Dredge R, Gleeson J, de la Piedad Garcia X. Cyberbullying in social networking sites: An adolescent victim’s perspective. Computers in Human Behavior. 2014; 36 :13–20. doi: 10.1016/j.chb.2014.03.026. [ CrossRef ] [ Google Scholar ]
  • Hackett, L., Verjee, L., Jones, S., Bauman, S., Smith, R., Everett, H. (2019) Ditch the label: The annual bullying survey (2019). Resource Document. https://www.ditchthelabel.org/wp-content/uploads/2019/11/The-Annual-Bullying-Survey-2019-1.pdf . Accessed October 12, 2020.
  • Ekambaram, R., Goldgof, D. B., & Hall, L. O. (2017). Finding label noise examples in large scale datasets. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE . 2420–2424.
  • Englander E, Donnerstein E, Kowalski R, Lin CA, Parti K. Defining cyberbullying. Pediatrics. 2017; 140 (Supplement 2):S148–S151. doi: 10.1542/peds.2016-1758U. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Garbe, W. (2020). SymSpell. Github. https://github.com/wolfgarbe/SymSpell . Accessed December 5, 2019.
  • Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media.
  • Giumetti, G. W., & Kowalski, R. M. (2016). Cyberbullying matters: Examining the incremental impact of cyberbullying on outcomes over and above traditional bullying in North America. In Cyberbullying across the globe,  117–130. Springer, Cham.
  • Gordon, S. (2020). Research shows rise in cyberbullying during COVID-19 pandemic. Verywell Family. https://www.verywellfamily.com/cyberbullying-increasing-during-global-pandemic-4845901 . Accessed September 25, 2020.
  • Guan D, Yuan W, Lee YK, Lee S. Identifying mislabeled training data with the aid of unlabeled data. Applied Intelligence. 2011; 35 (3):345–358. doi: 10.1007/s10489-010-0225-4. [ CrossRef ] [ Google Scholar ]
  • Hinduja, S., & Patchin, J. W. (2015). Bullying beyond the schoolyard: Preventing and responding to cyberbullying. Corwin Press.
  • Hinduja S, Patchin JW. Connecting adolescent suicide to the severity of bullying and cyberbullying. Journal of School Violence. 2019; 18 (3):333–346. doi: 10.1080/15388220.2018.1492417. [ CrossRef ] [ Google Scholar ]
  • Hinduja, S., & Patchin, J. W. (2019b). Cyberbullying fact sheet: identification, prevention, and response. Cyberbullying Research Center. https://cyberbullying.org/Cyberbullying-Identification-Prevention-Response-2019.pdf . Accessed January 10, 2020.
  • Höher J, Scheithauer H, Schultze-Krumbholz A. How do adolescents in Germany define cyberbullying? A focus-group study of adolescents from a German major city. Praxis Der Kinderpsychologie Und Kinderpsychiatrie. 2014; 63 (5):361–378. doi: 10.13109/prkk.2014.63.5.361. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hosseinmardi, H., Mattson, S. A., Rafiq, R. I., Han, R., Lv, Q., & Mishra, S. (2015). Detection of cyberbullying incidents on the instagram social network. arXiv preprint arXiv:1503.03909
  • Kennedy R, Clifford S, Burleigh T, Waggoner PD, Jewell R, Winter NJ. The shape of and solutions to the MTurk quality crisis. Political Science Research and Methods. 2020; 8 (4):614–629. doi: 10.1017/psrm.2020.6. [ CrossRef ] [ Google Scholar ]
  • Kessel Schneider S, O'Donnell L, Smith E. Trends in cyberbullying and school bullying victimization in a regional census of high school students, 2006–2012. Journal of School Health. 2015; 85 (9):611–620. doi: 10.1111/josh.12290. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D. Text classification algorithms: A survey. Information. 2019; 10 (4):150. doi: 10.3390/info10040150. [ CrossRef ] [ Google Scholar ]
  • Langos C. Cyberbullying: The challenge to define. Cyberpsychology, Behavior, and Social Networking. 2012; 15 (6):285–289. doi: 10.1089/cyber.2011.0588. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lin, C. H., Mausam, M., & Weld, D. S. (2014). To re (label), or not to re (label). In HCOMP .
  • Menesini E, Nocentini A, Palladino BE, Frisén A, Berne S, Ortega-Ruiz R, Calmaestra J, Scheithauer H, Schultze-Krumbholz A, Luik P, Naruskov K, Blaya C, Berthaud J, Smith PK. Cyberbullying definition among adolescents: A comparison across six European countries. Cyberpsychology, Behavior, and Social Networking. 2012; 15 (9):455–463. doi: 10.1089/cyber.2012.0040. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems , 3111–3119.
  • Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2020). Deep learning based text classification: a comprehensive review .
  • Müller, N. M., & Markert, K. (2019). Identifying mislabeled instances in classification datasets. In 2019 International Joint Conference on Neural Networks (IJCNN) IEEE,  1–8.
  • Nandhini, B. S., & Sheeba, J. I. (2015). Cyberbullying detection and classification using information retrieval algorithm. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology , 1–5.
  • Nixon CL. Current perspectives: The impact of cyberbullying on adolescent health. Adolescent Health, Medicine and Therapeutics. 2014; 5 :143. doi: 10.2147/AHMT.S36456. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nocentini A, Calmaestra J, Schultze-Krumbholz A, Scheithauer H, Ortega R, Menesini E. Cyberbullying: Labels, behaviours and definition in three European countries. Australian Journal of Guidance and Counselling. 2010; 20 (2):129. doi: 10.1375/ajgc.20.2.129. [ CrossRef ] [ Google Scholar ]
  • Our range of enforcement options. (2020). Twitter. https://help.twitter.com/en/rules-and-policies/enforcement-options . Accessed September 25, 2020.
  • Patchin, J. W., & Hinduja, S. (2019). Summary of our cyberbullying research (2007–2019). Retrieved from Cyberbullying Research Center website: https://cyberbullying.org/summary-of-our-cyberbullying-research . Accessed September 25, 2020.
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011; 12 :2825–2830. [ Google Scholar ]
  • Peter IK, Petermann F. Cyberbullying: A concept analysis of defining attributes and additional influencing factors. Computers in Human Behavior. 2018; 86 :350–366. doi: 10.1016/j.chb.2018.05.013. [ CrossRef ] [ Google Scholar ]
  • Ptaszyński, M., Leliwa, G., Piech, M., & Smywiński-Pohl, A. (2018). Cyberbullying detection–technical report 2/2018, Department of Computer Science AGH, University of Science and Technology. arXiv preprint arXiv:1808.00926.
  • Reynolds, K., Kontostathis, A., & Edwards, L. (2011). Using machine learning to detect cyberbullying. In 2011 10th International Conference on Machine Learning and Applications and Workshops IEEE, 2 , 241–244.
  • Rosa, H., Matos, D., Ribeiro, R., Coheur, L., & Carvalho, J. P. (2018). A “deeper” look at detecting cyberbullying in social networks. In 2018 International Joint Conference on Neural Networks (IJCNN) IEEE,  1–8.
  • Rosa H, Pereira N, Ribeiro R, Ferreira PC, Carvalho JP, Oliveira S, Coheur L, Paulino P, Simão AM, Trancoso I. Automatic cyberbullying detection: A systematic review. Computers in Human Behavior. 2019; 93 :333–345. doi: 10.1016/j.chb.2018.12.021. [ CrossRef ] [ Google Scholar ]
  • Salawu, S., He, Y., Lumsden, J. (2017). Approaches to automated detection of cyberbullying: A survey. IEEE Transactions on Affective Computing.
  • Samami, M., Akbari, E., Abdar, M., Plawiak, P., Nematzadeh, H., Basiri, M. E., & Makarenkov, V. (2020). A mixed solution-based high agreement filtering method for class noise detection in binary classification. Physica A: Statistical Mechanics and its Applications , 124219.
  • Smith P. K, del Barrio, C., & Tokunaga, R. (2013). Definitions of bullying and cyberbullying: How useful are the terms? In S, Bauman, D, Cross, & J, Walker (Eds) Principles of Cyberbullying Research: Definition, Measures, and Methods,  pp. 29–40. Philadelphia, PA: Routledge.
  • Steinmetz, K. (2019). Inside Instagram’s war on bullying. Time. https://time.com/5619999/instagram-mosseri-bullying-artificial-intelligence/ . Accessed September 25, 2020.
  • Sugandhi R, Pande A, Agrawal A, Bhagat H. Automatic monitoring and prevention of cyberbullying. International Journal of Computer Applications. 2016; 8 :17–19. doi: 10.5120/ijca2016910408. [ CrossRef ] [ Google Scholar ]
  • Suler J. The online disinhibition effect. Cyberpsychology & Behavior: THe Impact of the Internet, Multimedia and Virtual Reality on Behavior and Society. 2004; 7 :321–326. doi: 10.1089/1094931041291295. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Trana RE, Gomez CE, Adler RF. International Conference on Applied Human Factors and Ergonomics. Springer; 2020. Fighting cyberbullying: An analysis of algorithms used to detect harassing text found on YouTube; pp. 9–15. [ Google Scholar ]
  • Vaillancourt T, Faris R, Mishna F. Cyberbullying in children and youth: Implications for health and clinical practice. The Canadian Journal of Psychiatry. 2017; 62 (6):368–373. doi: 10.1177/0706743716684791. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vandebosch H, Van Cleemput K. Defining cyberbullying: A qualitative research into the perceptions of youngsters. Cyberpsychology & Behavior : THe Impact of the Internet, Multimedia and Virtual Reality on Behavior and Society. 2008; 11 (4):499–503. doi: 10.1089/cpb.2007.0042. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Van Hee, C., Lefever, E., Verhoeven, B., Mennes, J., Desmet, B., De Pauw, G., & Hoste, V. (2015). Detection and fine-grained classification of cyberbullying events. In International Conference Recent Advances in Natural Language Processing , 672–680.
  • Vranjes I, Baillien E, Vandebosch H, Erreygers S, De Witte H. The dark side of working online: Towards a definition and an emotion reaction model of workplace cyberbullying. Computers in Human Behavior. 2017; 69 :324–334. doi: 10.1016/j.chb.2016.12.055. [ CrossRef ] [ Google Scholar ]
  • Wais, P., Lingamneni, S., Cook, D., Fennell, J., Goldenberg, B., Lubarov, D., & Simons, H. (2010). Towards building a high-quality workforce with Mechanical Turk. In Proceedings of Computational Social Science and the Wisdom of Crowds (NIPS), 1–5.
  • Walker CM. Cyberbullying redefined: An analysis of intent and repetition. International Journal of Education and Social Science. 2014; 1 (5):59–69. [ Google Scholar ]
  • Word2Vec. (2013). Google Code . Document Resource. https://code.google.com/archive/p/word2vec/ . Accessed December 5, 2019.

Dealing with Revenge Porn and “Sextortion”

Deal with a bully and overcome bullying, social media and mental health, caffeine and its effects on teenagers.

  • Body Shaming: The Effects and How to Overcome it

Help for Parents of Troubled Teens

  • Vaping: The Health Risks and How to Quit
  • Anxiety in Children and Teens: A Parent’s Guide
  • Online Therapy: Is it Right for You?
  • Mental Health
  • Health & Wellness
  • Children & Family
  • Relationships

Are you or someone you know in crisis?

  • Bipolar Disorder
  • Eating Disorders
  • Grief & Loss
  • Personality Disorders
  • PTSD & Trauma
  • Schizophrenia
  • Therapy & Medication
  • Exercise & Fitness
  • Healthy Eating
  • Well-being & Happiness
  • Weight Loss
  • Work & Career
  • Illness & Disability
  • Heart Health
  • Learning Disabilities
  • Family Caregiving
  • Teen Issues
  • Communication
  • Emotional Intelligence
  • Love & Friendship
  • Domestic Abuse
  • Healthy Aging
  • Alzheimer’s Disease & Dementia
  • End of Life
  • Meet Our Team

What is cyberbullying?

The effects of cyberbullying, how to deal with cyberbullying tip 1: respond to the cyberbully in the right way, tip 2: reevaluate your internet and social media habits.

  • Tip 3: Find support from those who don't cyberbully

Tip 4: View cyberbullying from a different perspective

Tip 5: practice body positivity, tips for parents to stop or prevent cyberbullying, if your child is a cyberbully, cyberbullying how to deal with online bullies.

Technology means a cyberbully can harass and intimidate you anywhere and at any time until nowhere feels safe. But there are ways to protect yourself or your child from online bullies.

data presentation about cyberbullying

Cyberbullying occurs when someone uses the internet, emails, messaging, social media, or other digital technology to harass, threaten, or humiliate another person. Unlike traditional bullying, cyberbullying isn’t limited to schoolyards, street corners, or workplaces, but can occur anywhere via smartphones, tablets, and computers, 24 hours a day, seven days a week. Cyberbullies don’t require face-to-face contact and their bullying isn’t limited to just a handful of witnesses at a time. It also doesn’t require physical power or strength in numbers.

Cyberbullies can torment you relentlessly and the bullying can follow you anywhere so that no place, not even home, ever feels safe. And with a few clicks the humiliation can be witnessed by hundreds or even thousands of people online.

For those who suffer cyberbullying, the effects can be devastating. Being bullied online can leave you feeling hurt, humiliated, angry, depressed, or even suicidal. But no type of bullying should ever be tolerated.

If you or a loved one is currently the victim of cyberbullying, it’s important to remember that you’re not alone. Around half of teenagers in the U.S. have suffered from cyberbullying or online harassment and as many as 43 percent of adults working remotely have been bullied online. But whatever your circumstances, there are ways to fightback against cyberbullies, overcome the pain and anguish, and reclaim your sense of identity and self-worth.

Who cyberbullies?

Cyberbullies come in all shapes and sizes. Almost anyone with an internet connection or smartphone can cyberbully someone else, often without having to reveal their true identity. As with face-to-face bullying, all genders cyberbully, but tend to do so in different ways.

Boys tend to bully by “sexting” (sending messages of a sexual nature), posting revenge porn , or with messages that threaten physical harm. Girls, on the other hand, more commonly cyberbully by spreading lies and rumors, exposing your secrets, or by excluding you from social media groups, emails, buddy lists and the like. Because cyberbullying is so easy to perpetrate, a child or teen can easily change roles, going from cyberbullying victim at one point to cyberbully the next, and then back again.

The methods kids and teens use to cyberbully can be as varied and imaginative as the technology they have access to. This could range from sending threatening or taunting messages via email, text, social media, or IM to breaking into your email account or stealing your online identity to hurt and humiliate you. Some cyberbullies may even create a website or social media page to target you.

Speak to a Licensed Therapist

BetterHelp is an online therapy service that matches you to licensed, accredited therapists who can help with depression, anxiety, relationships, and more. Take the assessment and get matched with a therapist in as little as 48 hours.

Any type of bullying, in-person or online, can leave you feeling deeply distressed, scared, angry, or ashamed. It can take a heavy toll on your self-esteem and trigger mental health problems such as depression, anxiety, and PTSD. You may feel like you’re alone and powerless to make the bullying stop—or even that you’re somehow responsible for being bullied.

[Read: Deal with a Bully and Overcome Bullying]

In many cases, though, cyberbullying can be even more painful than face-to-face bullying because:

Cyberbullying can happen anywhere, at any time. You may experience it even in places where you’d normally feel safe, such as your home, and at times when you’d least expect it, like during the weekend in the company of your family. It can seem like there’s no escape from the taunting and humiliation.

A lot of cyberbullying can be done anonymously, so you may not be sure who is targeting you. This can make you feel even more threatened and can embolden bullies, as they believe online anonymity means they’re less likely to get caught. Since cyberbullies can’t see your reaction, they will often go much further in their harassment or ridicule than they would if they were face-to-face with you.

Cyberbullying can be witnessed by potentially thousands of people. Emails, messages, and tweets can be forwarded to many, many people while social media posts or website comments can often be seen by anyone. The more far-reaching the bullying, the more humiliating it can become.

Cyberbullying can often be permanent. Malicious lies or embarrassing images can often remain visible online indefinitely, having long-term consequences on your life, reputation, and well-being.

Cyberbullying and suicide

If cyberbullying leads to you, or someone you know, feeling suicidal, please call 1-800-273-8255 in the U.S., or visit IASP or Suicide.org to find a helpline in your country.

If you are targeted by cyberbullies, it’s important not to respond to any messages or posts written about you, no matter how hurtful or untrue. Responding will only make the situation worse and provoking a reaction from you is exactly what the cyberbully wants, so don’t give them the satisfaction.

It’s also very important that you don’t seek revenge on a cyberbully by becoming a cyberbully yourself. Again, it will only make the problem worse and could result in serious legal consequences for you. If you wouldn’t say it in person, don’t say it online.

Instead, respond to cyberbullying by:

Saving the evidence of the cyberbullying . Keep abusive text messages or a screenshot of a webpage, for example, and then report them to a trusted adult, such as a family member, teacher, or school counselor. If you don’t report incidents, the cyberbully will often become more aggressive.

Getting help . Talk to a parent, teacher, counselor, or other trusted adult. Seeing a counselor does not mean there is something wrong with you.

Reporting threats of harm and inappropriate sexual messages to the police. In many cases, the cyberbully’s actions can be prosecuted by law.

Being relentless . Cyberbullying is rarely limited to one or two incidents. It’s far more likely to be a sustained attack on you over a period of time. So, like the cyberbully, you may have to be relentless and keep reporting each and every bullying incident until it stops. There is no reason for you to ever put up with cyberbullying.

Preventing communication from the cyberbully . Block their email address and cell phone number, unfriend or unfollow them, and delete them from your social media contacts. Report their activities to their internet service provider (ISP) or to any social media or other web sites they use to target you. The cyberbully’s actions may constitute a violation of the website’s terms of service or, depending on the laws in your area, may even warrant criminal charges.

Spending time online, particularly on social media, can help you feel connected to friends and family around the world and find new communities, interests, and outlets for self-expression. However, spending too much time on social media can also have some negative effects.

Whether you’re on Twitter, Facebook, TikTok, SnapChat, Instagram, or another platform, heavy social media use can actually make you feel more lonely and isolated, rather than less. It can also impact your self-esteem, exacerbate common mental health problems, lead to feelings of dissatisfaction, sadness, and frustration, and of course, leave you more open to instances of cyberbullying.

Many of us have a fear of missing out (FOMO) if we’re not instantly liking, sharing, or responding to social media posts. But the truth is there are very few things that require your immediate response. Constantly checking and rechecking your phone can often be a way to mask other underlying problems, such as boredom, feelings of anxiety or depression, or the need to feel less awkward and alone in social situations.

By changing your focus to offline friends and activities, though, and making a conscious effort to spend less time on social media, you can improve your mood and mental health as well as change how cyberbullying impacts your life.

[Read: Social Media and Mental Health]

Taking a break from social media, putting away your phone , and unplugging from technology can also open you up to meeting new people—especially those who don’t spread hurtful rumors, lies, and abuse online.

Tip 3: Find support from those who don’t cyberbully

Having trusted people you can turn to for support and reassurance can help you cope with even the most spiteful and damaging experiences of cyberbullying. Reach out to connect with family and real friends or explore ways of making new friends . There are plenty of people who love and appreciate you for who you are.

Share your feelings about cyberbullying . Even if the person you talk to can’t provide answers, the simple act of opening up about how you feel to someone who cares about you can make a real difference to your mood and self-esteem. Try talking to a parent, counselor, coach, religious leader, or trusted friend.

Spend time doing things you enjoy . When you spend time pursuing hobbies and interests that bring you joy, cyberbullying can have less significance in your life. Join a sports team, rekindle an old hobby, or hang out with friends who don’t participate in bullying.

Find others who share your same values and interests . Many people are cyberbullied for not fitting in with the mainstream. Whether it’s your race, sexual orientation, beliefs, or gender that makes you a target, it’s important to remember that you’re not alone. There are lots of other people who’ve been through what you’re dealing with now, share your values, and will appreciate your individualism. Look for Meetup groups with people who share your interests, join a book group, volunteer for a cause that’s important to you , or enroll in a team, youth group, or religious organization where you’ll find like-minded people.

You can help to ease the pain of cyberbullying by viewing the problem from a different perspective. The cyberbully is a jealous, frustrated person, often trying to escape their own problems. Their goal is to have control over your feelings so that they feel tough and powerful and you feel as unhappy as they do. Don’t give them the satisfaction.

Don’t blame yourself . No matter what a cyberbully does or says about you, it’s important to remember that it’s not your fault. Never feel guilty or be ashamed of who you are or what you feel. The cyberbully is the person with the problem, not you.

Don’t beat yourself up . Don’t make a cyberbullying incident worse by reading the message over and over and punishing yourself further. Life moves so fast online that in a few days or weeks other people will likely have forgotten the incident. Instead, delete any hurtful or abusive messages and focus on the positive, instead. There are many wonderful things about you, so be proud of who you are.

Manage your stress . Experiencing cyberbullying can leave you feeling jittery, nervous, and overwhelmed. But there are healthy ways to manage stress and build your resilience to the damaging effects of cyberbullying. Exercise, meditation, muscle relaxation, breathing exercises , and positive self-talk are all greats ways to relax, burn off frustration, and build mental fortitude against future negative experiences.

[Read: Surviving Tough Times by Building Resilience]

Focus on positive aspects of your life . It’s easy to become absorbed by the negativity of cyberbullying and get trapped in a downward spiral. But you can break free of the pessimism and boost your mood and self-esteem by switching your focus to things you like and feel grateful for in your life. These don’t have to be huge things; taking a few moments each day to appreciate a kind message from a friend, the love of a family member, or joy of walking in nature can make a real difference to how you feel. Try writing down three things you’re grateful for at the end of each day.

Offensive name-calling is one of the most common types of cyberbullying, and it’s not unusual for bullies to resort to body shaming and weight shaming online. Appearance-based insults can be hurtful to people of any age, but teens may be especially sensitive.

When you’re adjusting to the physical changes that come with adolescence, any negative body perceptions you have can be exacerbated when you compare yourself to celebrities or even your own peers. Body-shaming comments from cyberbullies can tear down your self-esteem and have a long-lasting impact. Some research shows that body shaming can even trigger depression symptoms in teens . It’s also linked to anxiety and eating disorders.

[Read: Body Shaming]

No matter how unpleasant your experiences, though, boosting your body positivity can help counter the effects of appearance-based cyberbullying.

Focus on what you like about yourself. When an online bully insults you, you may internalize those comments and mistake them for the truth. Take note of your inner voice. Is it simply parroting the bully’s words? Are you calling yourself unattractive or inferior? Shift to healthier self-talk by making a few positive statements about yourself. Maybe you love the way your eyes and hair look. You can also build yourself up by acknowledging positive personality traits, such as your kindness or sense of humor.  

Practice self-acceptance. New digital tools, such as airbrushing and beauty filters, give people all sorts of ways to alter their appearance online. In fact, social media is filled with manipulated photos as people try to create “idealized” versions of themselves. In the process, this can skew expectations about what we and others should really look like. When a cyberbully criticizes your appearance, you might be tempted to use these tools to hide your imperfections. However, this has the potential to even further damage your self-esteem. Instead, acknowledge that your body is unique and that everyone has flaws, even if they choose to airbrush them out online.

Begin with body neutrality. If being positive about your appearance feels too difficult, start with a neutral stance. Instead of focusing on your looks, put the emphasis on what your body can do. Make a simple list of things that your body is capable of, whether that includes walking or running a mile or moving furniture. This can be a step towards better accepting and respecting your body.

Keep a healthy relationship with food. Body shaming by cyberbullies can affect how you think about food and your eating habits. Weight-based insults might even lead you to consider unhealthy diet restrictions. But it’s important to recognize that food isn’t your enemy. Don’t allow a cyberbully to have that kind of power over you. Instead, focus on eating a healthy, balanced diet, making mealtimes a happy, social experience, and using mindful eating techniques , such as savoring each bite, to increase your enjoyment of meals.

Many kids can be reluctant to tell their parents about cyberbullying out of a fear that doing so may result in losing their cell phone or computer privileges. While parents should always monitor a child’s use of technology, it’s important not to threaten to withdraw access or otherwise punish a child who’s been the victim of cyberbullying.

Spot the warning signs of cyberbullying

Unlike traditional bullying where the bruises are often easily noticeable, it can be harder for parents to spot the signs of cyberbullying. You child may be a victim if they:

  • Seem upset, angry, or otherwise distressed as a result of time spent online or using their phone.
  • Appear anxious when receiving a text, message, or social media notification.
  • Become secretive about their online and social media activities.
  • Refuse to go to school or to specific classes, or avoid group activities.
  • Withdraw from friends, group activities, or online and in-person events they used to enjoy.
  • Suffer an unusual and sudden drop in performance at school.
  • Exhibit changes in behavior, sleeping , and eating patterns, or a decline in mood (such as signs of depression or anxiety ).

Prevent cyberbullying before it starts

One of the best ways to stop cyberbullying is to prevent the problem before it starts. To stay safe with technology, teach your kids to:

  • Refuse to pass along cyberbullying messages.
  • Tell their friends to stop cyberbullying.
  • Block communication with cyberbullies; delete messages without reading them.
  • Never post or share their personal information—or their friends’ personal information—online.
  • Never share their internet passwords with anyone, except you.
  • Talk to you about their life online.
  • Not put anything online that they wouldn’t want their friends or classmates to see, even in email.
  • Not send messages when they’re angry or upset.
  • Always be as polite online as they are in person.

Monitor your child’s technology use

Regardless of how much your child resents it, you can only protect them by monitoring what they do online.

Use parental control apps on your child’s smartphone or tablet and set up filters on your child’s computer to block inappropriate web content and help you monitor their online activities.

Limit data access to your child’s smartphone. Some wireless providers allow you to turn off text messaging services during certain hours.

Insist on knowing your child’s passwords and learn the common acronyms kids use online, in social media, and in messaging apps.

Know who your child communicates with online. Go over your child’s address book and social media contacts with them. Ask who each person is and how your child knows them.

Encourage your child to tell you or another trusted adult if they receive threatening messages or are otherwise targeted by cyberbullies, while reassuring them that doing so will not result in their loss of phone or computer privileges.

It’s never easy for a parent to learn that their child is cyberbullying others, but it’s important to take action and curb your child’s negative behavior before it can have serious repercussions.

If your child has responded to being cyberbullied by employing their own cyberbullying tactics, you can help them to find better ways of dealing with the problem. If your child has trouble managing strong emotions such as anger, hurt, or frustration, talk to a therapist about helping your child learn to cope with these feelings in a healthy way.

[Read: Help for Parents of Troubled Teens]

Cyberbullying is often a learned behavior

Some cyberbullies learn aggressive behavior from their experiences at home, so it’s important to set a good example with your own online, social media, and messaging habits. As a parent, you may be setting a bad example for your kids by:

  • Sending or forwarding abusive emails, social media posts, or text messages that target coworkers, neighbors, or acquaintances.
  • Communicating with people online in ways that you wouldn’t do face-to-face.
  • Displaying bullying behavior—in-person or online—such as verbally or physically abusing others or intimidating people.

Tips for parents dealing with a child who cyberbullies

Learn about your child’s friends and social life. Sometimes a child or teen’s friends can encourage their bullying behavior online. By regularly talking to your child about their life and who they’re socializing with, the easier it will be to uncover any problems they may be having fitting in or building relationships with others.

Educate your child about cyberbullying. When bullying is done virtually, the bully often doesn’t see the consequences of their actions. Often, a child may not understand how hurtful and damaging their behavior online can be to others. As a parent, though, you can help to foster your child’s empathy by encouraging them to look at their behavior from the victim’s perspective. It’s also worth reminding your child that cyberbullying can have serious legal consequences.

Encourage your child to manage stress. Your child’s cyberbullying may be an attempt at relieving the stress they’re experiencing at home or at school. But there are much healthier ways to let off steam and relieve tension. Try taking up a new sport or physical activity with your child or teaching them how to practice relaxation techniques .

Set limits with technology. Let your child know that you’ll be monitoring their online behavior. If necessary, remove access to technology until behavior improves.

Establish consistent rules of behavior. While your child may resent any attempts you make to discipline them, the truth is that the rules and boundaries you set shows your child that they’re worthy of your time and attention.

Bullying and cyberbullying helplines

1-800-273-8255 –  Crisis Call Center

0845 22 55 787 –  National Bullying Helpline

1-877-352-4497 –  BullyingCanada

1800 551 800 –  Kids Helpline

0800 942 8787 – 0800 What’sUp?

1098 –  Childline India

More Information

  • Cyberbullying - Tips for teenagers in dealing with cyberbullies. (TeensHealth)
  • It Gets Better - Videos for LGBT kids and teens. (It Gets Better Project)
  • Resilience Guide for Parents and Teachers - Building resilience in children. (APA)
  • Report Cyberbullying - Tips on how and where to report online bullying. (ADL)
  • Cyberbullying - Tips for parents to help a child being cyberbullied. (KidsHealth)
  • Cyberbullying Research Center - Offers a list of social media apps, websites, gaming networks, and related companies where you can report instances of cyberbullying.
  • Namie, Gary. “2021 WBI U.S. Workplace Bullying Survey.” Workplace Bullying Institute (blog). Accessed March 30, 2022. Link
  • “The NCES Fast Facts Tool Provides Quick Answers to Many Education Questions (National Center for Education Statistics).” National Center for Education Statistics. Accessed March 30, 2022. Link
  • Kowalski, Robin M. “Cyber Bullying: Recognizing and Treating Victim and Aggressor.” Psychiatric Times 25, no. 11 (October 1, 2008): 45–45. Link
  • Shariff, Shaheen. Cyber-Bullying: Issues and Solutions for the School, the Classroom and the Home . London: Routledge, 2008. Link
  • Sharp, Sonia. “How Much Does Bullying Hurt? The Effects of Bullying on the Personal Wellbeing and Educational Progress of Secondary Aged Students.” Educational and Child Psychology 12, no. 2 (1995): 81–88. Link
  • Wilton, Courtney, and Marilyn Campbell. “An Exploration of the Reasons Why Adolescents Engage in Traditional and Cyber Bullying.” Journal of Educational Sciences and Psychology 1, no. 2 (2011): 101–9. Link
  • Atske, Sara. “Teens and Cyberbullying 2022.” Pew Research Center: Internet, Science & Tech (blog), December 15, 2022. Link
  • Brewis, A., & Bruening, M. (2018). Weight Shame, Social Connection, and Depressive Symptoms in Late Adolescence. International Journal of Environmental Research and Public Health , 15 (5), 891. Link
  • Vogel, L. (2019). Fat shaming is making people sicker and heavier. Canadian Medical Association Journal , 191 (23), E649–E649. Link
  • Vogel, Erin, and Jason Rose. “Self-Reflection and Interpersonal Connection: Making the Most of Self-Presentation on Social Media.” Translational Issues in Psychological Science 2 (September 1, 2016). Link

More in Teen Issues

Coping with online abuse and practicing safe sexting

data presentation about cyberbullying

How to protect yourself or your child

data presentation about cyberbullying

Changing habits to avoid anxiety, depression, addiction, and FOMO

data presentation about cyberbullying

How much is safe and how to cut back

data presentation about cyberbullying

Body Shaming

Improving your body image and achieving body acceptance

data presentation about cyberbullying

Dealing with anger, violence, delinquency, and other behaviors

data presentation about cyberbullying

The health risks in young people and how to quit

data presentation about cyberbullying

Anxiety in Children and Teens

A parent’s guide to managing symptoms

Shadows of swings on a playground

Professional therapy, done online

BetterHelp makes starting therapy easy. Take the assessment and get matched with a professional, licensed therapist.

Help us help others

Millions of readers rely on HelpGuide.org for free, evidence-based resources to understand and navigate mental health challenges. Please donate today to help us save, support, and change lives.

Curating Cyberbullying Datasets: a Human-AI Collaborative Approach

  • Original Article
  • Published: 22 December 2021
  • Volume 4 , pages 35–46, ( 2022 )

Cite this article

data presentation about cyberbullying

  • Christopher E. Gomez 1 ,
  • Marcelo O. Sztainberg 1 &
  • Rachel E. Trana   ORCID: orcid.org/0000-0003-3878-2362 1  

4474 Accesses

6 Citations

3 Altmetric

Explore all metrics

Cyberbullying is the use of digital communication tools and spaces to inflict physical, mental, or emotional distress. This serious form of aggression is frequently targeted at, but not limited to, vulnerable populations. A common problem when creating machine learning models to identify cyberbullying is the availability of accurately annotated, reliable, relevant, and diverse datasets. Datasets intended to train models for cyberbullying detection are typically annotated by human participants, which can introduce the following issues: (1) annotator bias, (2) incorrect annotation due to language and cultural barriers, and (3) the inherent subjectivity of the task can naturally create multiple valid labels for a given comment. The result can be a potentially inadequate dataset with one or more of these overlapping issues. We propose two machine learning approaches to identify and filter unambiguous comments in a cyberbullying dataset of roughly 19,000 comments collected from YouTube that was initially annotated using Amazon Mechanical Turk (AMT). Using consensus filtering methods, comments were classified as unambiguous when an agreement occurred between the AMT workers’ majority label and the unanimous algorithmic filtering label. Comments identified as unambiguous were extracted and used to curate new datasets. We then used an artificial neural network to test for performance on these datasets. Compared to the original dataset, the classifier exhibits a large improvement in performance on modified versions of the dataset and can yield insight into the type of data that is consistently classified as bullying or non-bullying. This annotation approach can be expanded from cyberbullying datasets onto any classification corpus that has a similar complexity in scope.

Similar content being viewed by others

data presentation about cyberbullying

Fighting Cyberbullying: An Analysis of Algorithms Used to Detect Harassing Text Found on YouTube

Modern approaches to detecting and classifying toxic comments using neural networks.

data presentation about cyberbullying

Cyberbullying Detection in Native Languages

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

Cyberbullying, a term that first arose just before the year 2000, is a form of bullying enacted through an online space (Cyberbullying n.d. ; Englander et al.,  2017 ). It has become more prevalent, especially with the creation and increased use of social media applications such as Facebook, Twitter, Instagram, and YouTube (Kessel et al.,  2015 ; Hinduja & Patchin 2019a ; Patchin & Hinduja, 2019 ). Additional evidence suggests that cyberbullying has experienced an even more dramatic increase due to the recent Covid-19 pandemic, which caused children and teenagers, age groups most at risk of being victims of cyberbullying, to spend extended time on online applications (Gordon, 2020 ) for both academic and leisure activities. Victims of cyberbullying can exhibit both psychosocial health problems, such as depression, anxiety, and suicidal ideation, as well as psychosomatic disorders, such as headaches and fatigue (Giumenti & Kowalski,  2016 ; Hackett et al., 2019 ; Nixon, 2014 ; Vaillancourt et al., 2017 ). The inherent online and far-reaching nature of cyberbullying makes it difficult to detect and prevent, and as a result, many individuals are vulnerable to this form of abuse. This study seeks to address several challenges with cyberbullying identification by using machine learning algorithms to evaluate a recently labeled YouTube dataset composed of approximately 19,000 comments.

Cyberbullying Definitions and Identification

Companies, such as Twitter and Instagram, have been actively working to create algorithms that can be used to detect cyberbullying by flagging suspicious content in order to address, prevent, and minimize cyberbullying incidents. In 2019, Instagram rolled out a feature that issues a warning to a user if their comment is considered to be potentially offensive (Steinmetz, 2019 ). This allows the user to rethink whether they wish to continue posting the flagged content. Twitter also takes steps to limit harmful content by implementing a specific policy depending on the level of severity, such as limiting the visibility of a tweet or sending a direct message to a user who was reported (Our Range of Enforcement Options, 2020 ).

A common approach when defining cyberbullying is to combine characteristics of traditional bullying (intention, repetition, power imbalance) with devices used in cyberspace (computers, cell phones, etc.) (Englander et al., 2017 ). Hinduja and Patchin define cyberbullying as “willful and repeated harm inflicted through the use of computers, cell phones, and other electronic devices” (Hinduja & Patchin, 2015 p. 5, Hinduja & Patchin 2019b ). However, using the traditional criteria of repetition and power imbalance to define cyberbullying has been a source of debate among researchers (Smith et al.,  2013 ). The delineation between a single occurrence and repetition can be unclear, since a single online action can be amplified and forwarded by multiple other participants to a larger general audience. Studies on young adult and adolescent definitions of bullying are inconsistent in terms of repetition, with some studies indicating that a single instance is sufficient or that repetition is irrelevant, when identifying cyberbullying (Menesini et al., 2012 ; Walker, 2014 ), and other studies reporting that repetition is a clear component of a cyberbullying definition (Höher et al., 2014 ; Nocentini et al., 2010 ; Vandebosch & Van Cleemput, 2008 ). The inclusion of repetition in adult definitions of cyberbullying in work environments is also contested, with studies suggesting that context (public vs private communications) determines whether repetition is a required component of a cyberbullying definition (Langos, 2012 ; Vranjes et al., 2017 ) and that victims could themselves further promote a form of repetition by revisiting online bullying communications, thus becoming quasi-perpetrators (D'Cruz & Noronha, 2018 ). The criterion of power imbalance is similarly disputed by researchers as to its importance in the definition of cyberbullying. Multiple studies suggest that the power balance is not considered important in the definition of cyberbullying since the concept of a power imbalance is difficult to identify in a virtual space compared to a traditional bullying setting where a bully has superior strength or there are a large number of bullies (Dredge et al.,  2014 ; Höher et al., 2014 ; Nocentini et al., 2010 ). Other studies state that the inherent nature of an online environment, and specifically the anonymity, contribute to the power balance by enabling perpetrators to boldly attack targets with minimal repercussions (Hinduja & Patchin, 2015 ; Menesini et al., 2012 ; Peter & Petermann, 2018 ; Suler, 2004 ).

The challenges with reaching a consensus on a common definition of cyberbullying, even among subject matter experts, impact the labeling of cyberbullying datasets and subsequently the algorithms and models derived from this data. Cyberbullying datasets are frequently labeled by human participants who may have little formal training or context on cyberbullying and, given the lack of a clear definition of cyberbullying, rely on their individual perspectives, cultural context and understandings, and personal biases when annotating data.

Annotation of Existing Cyberbullying Datasets

Using human participants to annotate data is a common practice in situations where the label cannot be obtained innately through the data. Researchers frequently have an odd number of participants determine whether content is considered bullying or non-bullying and assign a final label based on the majority vote (Rosa et al., 2019 ). For example, Reynolds et al. ( 2011 ) recruited three workers and stated the reason for doing so was due to the subjectivity of the task, and that the wisdom of three workers provided confidence in the labeling. However, the subjectiveness of the content does not necessarily produce a unanimous agreement among workers’ labels, thus creating annotations that are themselves uncertain. Many frequently referenced cyberbullying datasets have been evaluated and labeled using an odd number of human participants. Dadvar et al. ( 2012 ) had three students label 2200 posts from Myspace, a social networking service, as harassing or non-harassing. Chatzakou et al. ( 2017 ) recruited 834 workers from CrowdFlower, a crowdsourcing site specifically made for machine learning and data science tasks, to label a Twitter dataset where they had five workers per task and, to eliminate bias, workers were only used once per task. Hosseinmardi et al. ( 2015 ) created a dataset using Instagram, a photo- and video-sharing social networking site, where they had five workers determine if a media session (media object/image and comments) was an instance of cyberaggression (using digital media to intentionally harm another person) and cyberbullying (a form of cyberagression that is intentional, repeated, and carried out through a digital medium against a person who cannot easily defend themselves). A dataset collected from Formspring, a question-and-answer site, was originally curated using Amazon Mechanical Turk (MTurk), an online marketplace for human-related tasks, where three workers were tasked with labeling each question and answer as being bullying or not. They were also asked to rate the post on a scale of no bullying (0) to severe (10) and to select, if any, words or phrases that indicate bullying and add additional comments (Reynolds et al., 2011 ).

Many studies use MTurk for labeling purposes given its low cost and ease of use in textual cyberbullying identification. However, the use of MTurk introduces additional labeling concerns, such as the training level of MTurk workers. Wais et al. ( 2010 ) had MTurk workers annotate over 100,000 expert-verified business listings and found that most workers do not produce adequate work. The authors found that workers performed poorly on what they considered simple verification tasks, and they hypothesized that this is because the workers “find the tasks boring and “cruise” through them as quickly as possible” (Wais et al., 2010 ). It is therefore necessary to recruit highly trained and rated workers to annotate content for cyberbullying.

Issues with labeled data using MTurk workers have also been identified in other cyberbullying datasets. An analysis on the dataset collected from Formspring found many cases where the labels were incorrectly annotated (Ptazynski et al.,  2018 ). In a recent survey, Rosa et al. ( 2019 ) found that only 5 out of 22 cyberbullying studies provided sufficient information on the labeling instructions provided to human participants to annotate the data. The remaining 17 studies were ambiguous when providing details to annotators for labeling purposes or when determining whether annotators were experts in the domain of cyberbullying. From the five studies (Bayzick et al., 2011 ; Hosseinmardi et al., 2015 ; Ptaszynski et al., 2018 ; Sugandhi et al., 2016 ; Van Hee et al., 2015 ) that provided some instruction, the annotators were given definitions of cyberbullying and/or given context to the content they were labeling. Rosa et al. ( 2019 ) also found that annotators for cyberbullying datasets, when available, were frequently students or random individuals on MTurk without specific qualifications. This suggests that while human participants are frequently employed to label cyberbullying datasets, the potential lack of qualifications or sufficient instructions can introduce bias and uncertainty into the associated labels.

Participants also have their own set of biases, cultural influences, and personal experiences that determine how they perceive specific content (Allison et al.,  2016 ; Baldasare et al., 2012 ; Dadvar et al., 2013 ). Unlike sentiment analysis, which revolves around the general sentiment of content (i.e., “I didn’t really like that movie”), cyberbullying is a direct attack on a person, or persons, that often requires situational context in order to be properly understood. As a result, an individual comment taken out of context can be interpreted in multiple ways. Furthermore, since workers perform their tasks remotely, it is challenging to verify whether the worker completing the task is human or a bot, thus potentially broadening the problem’s complexity (Ahler et al., 2019 ; Kennedy et al., 2020 ). The combination of these issues makes it uniquely challenging to collect a reliably annotated dataset for the purpose of developing machine learning models to identify cyberbullying.

Algorithmic Curation of Other Datasets

As mentioned previously, a concern with using human participants to label cyberbullying datasets is that humans can introduce errors (Lin et al.,  2014 ). To manage this problem, an identification and re-annotation process for labeled data can be implemented when at least 75% of the human-based annotations are accurate (Lin et al., 2014 ). One method to manage problematic data is to identify mislabeled data that negatively affects the performance of machine learning algorithms. Brodley and Friedl ( 1999 ) focused on the identification and elimination of mislabeled data that occurs because of “subjectivity, data-entry error, or inadequacy of the information used to label each object.” They implemented a set of filtering methods, referred to as majority vote and consensus filtering, to identify mislabeled data on five datasets. To achieve this, they used a set of three base-level classifiers in each of the two filtering methods. To consider a label as mislabeled, the majority vote filtering method required that only a majority number of the classifiers disagree with the original label. The consensus filtering method approach required that all of the classifiers disagree with the original label. Of these two approaches, they found that the majority vote method produced the best results. A limitation of this approach is that as noisy data increased within a dataset, it became less likely that the filtering methods would work (Brodley & Friedl, 1999 ). Guan et al. ( 2011 ) expanded on these filtering methods with “majority (and consensus) filtering with the aid of unlabeled data” (MFAUD and CFAUD). These proposed methods introduced a novel technique of using unlabeled data to aid in the identification of mislabeled data. The authors noted that the combination of using labeled data and unlabeled data is a semi-supervised learning method, as opposed to an unsupervised learning approach. However, the focus of the method is to identify mislabeled data as opposed to training a better classifier. The unlabeled data is labeled through the use of a classifier that is trained on a portion of labeled data. This then enlarges the original dataset, which can be used to further identify mislabeled data. The limitation of this technique is that it can be difficult to determine with a strong degree of confidence that the unlabeled data was correctly labeled by the classifier.

A more recent study by Müller and Markert ( 2019 ) introduced a pipeline that can identify mislabeled data in numerical, image, and natural language datasets. The efficacy of their pipeline was evaluated by introducing noisy data, or data that was intentionally changed to be different from its original label, in an amount of 1%, 2%, or 3%, into 29 well-known real-world and synthetic classification datasets. They then manually determined whether the flagged data was indeed mislabeled. Ekambaram et al. ( 2017 ) used support vector machine and random forest algorithms to detect mislabeled data in class pairs (for example, alligator vs crocodile) in a dataset known as ImageNet, which is composed of images and has 18 classes. Using a combination of both algorithms, they were able to detect 92 mislabeled examples, which were then subsequently confirmed as having been mislabeled by human participants. Samami et al. ( 2020 ) introduced a novel method that tackled weaknesses in the majority filtering and consensus filtering approaches. They found that consensus filtering often misses noisy data because of its strict rules that require the agreement of all base algorithms to find mislabeled data, whereas majority filtering is more successful in identifying and eliminating mislabeled data, but can also eliminate correctly labeled data. To address these issues, they proposed a High Agreement Voting Filtering (HAVF) using a mixed strategy, which “removes strong and semi-strong noisy samples and relabels weak noisy instances” (Samami et al., 2020 ). The authors applied this method on 16 real-world binary classification datasets and found that the HAVF method outperformed other filtering methods on the majority of datasets.

Using machine learning–based majority voting or consensus filtering methods has been applied extensively in prior research for classification datasets focused on topics such as finance, medical diagnosis, and news media (Brodley & Friedl, 1999 ; Ekambaram et al., 2017 ; Guan et al., 2011 ; Müller & Markert, 2019 ; Samami et al., 2020 ). However, to the best of our knowledge, these methods have not yet been applied to cyberbullying datasets. Furthermore, the purpose of this study is similar to that of many of these studies, which is to find and discard mislabeled data. This can be thought of as identifying instances of cyberbullying and non-cyberbullying that most individuals will classify as belonging to those classes. In this study, we propose two filtering approaches, referred to as Single-Algorithm Consensus Filtering and Multi-Algorithm Consensus Filtering, to curate a cyberbullying dataset. Considering the difficulty with establishing a definition of cyberbullying, even among experts, and the challenges present when using human participants to label cyberbullying data, the goal of this research is to use machine learning–based filtering approaches in collaboration with human annotators to evaluate an MTurk-labeled YouTube dataset composed of approximately 19,000 comments to (1) refine a cyberbullying dataset with unambiguous instances of cyberbullying and non-bullying comments and to (2) investigate whether an independent machine learning model is more performant on the curated datasets. For the purpose of this study, we define an unambiguous instance as an instance where there is an accord between the majority decision of the annotator labels and the label generated when the AI filtering models are in unanimous agreement.

Data Collection

To provide a current corpus for classification of cyberbullying text, we collected approximately 19,000 comments that were extracted using the YouTube API between October 2019 and January 2020. Using the API, the information extracted was (1) the date the comment was made, (2) the id of the video associated with the comment, (3) the author of the video associated with the comment, (4) the author of the comment, (5) the number of likes for the comment, and (6) the comment itself. However, only the comments were used for analysis. This general corpus consists of topics that are inherently controversial in nature, such as politics, religion, gender, race, and sexual orientation, and are geared toward teenagers and adults. This data was manually labeled as bullying/non-bullying using MTurk by providing batches of comments of varying sizes to MTurk workers, as well as a definition of bullying and a warning that foul language could be found in the comments. The definition we provided was as follows:

Is the text bullying? Bullying can be described as content that is harmful, negative or humiliating. Furthermore, the person reading the text could be between the ages of 12-19 and/or may have a mental health condition such as anxiety, depression, etc.

Given this information, they could choose to accept or reject the classification task. Three MTurk workers classified each comment in the corpus as bullying or non-bullying, where the majority classification decided the final label. The complete dataset contained 6462 bullying comments and 12,314 non-bullying comments, leading to a 34.4% bullying incidence rate, consistent with the description of a good dataset that has at least 10% to 20% bullying instances (Salawu et al., 2017 ).

Preprocessing

We preprocessed the collected comments using various preprocessing methods: lowercased all text; expanded contractions; removed any punctuation; eliminated stop words; reduced redundant letters (maximum of 2 consecutive letters); and removed empty comments. The preprocessing methods were all created using custom algorithms by first tokenizing the comments into word tokens and applying the appropriate algorithm if specific conditions were met. For example, contractions were expanded when a token matched a set of predefined contractions then expanding that token (i.e., “aren’t” becomes “are not”) and the reduction of letters occurred when a token contained more than 2 consecutive letters (i.e., “cooool” becomes “cool”). We also corrected misspellings through the use of the Symmetric Delete Spelling Correction algorithm (SymSpell) (Garbe, 2020 ). Misspellings can be indicative of slang terminology that can represent bullying intent; however, for the purposes of this study, we did not include a slang/sentiment analysis. Finally, we lemmatized the text using spaCy, a natural language processing library. For specific comments (such as “I am” or “I see”), removing the stop words produced an empty comment, which we then eliminated from the dataset. After preprocessing, a final dataset of 18,735 comments remained, with 34% labeled as bullying.

Feature Extraction

In the development of machine learning models, there are features that are extracted from datasets and used to train a model. These features are different depending on the nature of the dataset and the problem to be solved. For the purpose of our research, the features are the words found in the YouTube comments. To extract features from our dataset, we implemented two different approaches depending on the classification algorithm used: Bag of Words (BoW) and Word Embeddings. A popular method to develop a word embedding model is known as Word2Vec (Mikolov et al., 2013 ), which requires a large corpus of text data to be properly trained. Given our small dataset, we opted to use a pre-trained Word2Vec model based on GoogleNews for our experimentation (Word2Vec, 2013 ). We applied the BoW approach to the naive Bayes, support vector machine, and artificial neural network algorithms, and word embeddings were applied to the convolutional neural network algorithm.

Model Creation

We implemented four different machine learning algorithms: naive Bayes (NB), support vector machine (SVM), a convolutional neural network (CNN), and a feed-forward multilayer perceptron, a class of artificial neural network that we refer to as ANN for simplicity. NB is a probabilistic classifier, based on Bayes’ theorem, that produces a probability of a comment being bullying based on the occurrence of words in the comments (Nandhini & Sheeba, 2015 ). The implementation of naive Bayes is known as multinomial naive Bayes, which is appropriate for word counts, such as the use of BoW described previously. This implementation of naive Bayes has a smoothing parameter termed alpha that is used to address instances of zero probability. We left this parameter at its default setting of 1 to allow some probability for all words in each prediction. The SVM algorithm utilizes a kernel function to separate classes in a higher-level dimension if the default dimension of the data is not linearly separable. Furthermore, SVM accomplishes this through the use of hyperplanes and vectors which separate classes (bullying or non-bullying) based on the nearest training data points (i.e., comments least likely to be considered bullying or non-bullying) (Dinakar et al., 2012 ). By focusing on the nearest data points for each class, SVM can find the most optimal decision boundary to separate the data. The CNN and ANN algorithms are both neural networks, or interconnected networks of nodes that mimic a biological neural network arranged in layers (Géron, 2019 ; Minaee et al., 2020 ). The ANN used in this research is a deep neural network where the first layer is an embedding layer and the output layer is binary (bullying or non-bullying) with middle layers known as the hidden layers. The ANN model included two hidden layers, with the first of these hidden layers composed of 15 neurons and the second layer composed of 10 neurons. The neurons within the hidden layer utilized the Rectifier Linear Unit (ReLU) activation function, and the output layer was based on the sigmoid activation function. A CNN relies on filters that traverse through comments via word groupings (2 words, 3 words, and 4 words) to identify key pieces of information that can aid in text classification. The ability of a CNN to group words together makes it unique in its implementation compared to the other algorithms used in this study. The CNN connects to an artificial neural network, which uses the same parameters as the ANN described previously.

The NB, SVM, and CNN algorithms were used in the consensus filtering methods to evaluate the YouTube dataset and identify unambiguously labeled comments (i.e., all algorithm predictions are in agreement with the majority decision of the annotators’ labels), and the ANN was used as an independent model solely for the purpose of performance evaluation of the curated versions of the dataset. Each algorithm was chosen due to their successes in general text classification tasks (Kowasari et al.,  2019 ; Minaee et al., 2020 ), as well as text classification related to cyberbullying (Rosa et al., 2018 ). For all instances of supervised learning, we used an 80–20 split, where 80% of the data was used for training and 20% was used for evaluating the trained models. Python’s scikit-learn library (Pedregosa et al., 2011 ) was used to split the dataset into training and test sets, the BoW process, and the implementation of NB and SVM. Both neural networks (ANN and CNN) were implemented using TensorFlow (Abadi et al., 2016 ). The performance of the ANN on modified datasets was evaluated using two scoring metrics: accuracy and F -score. In nearly-balanced datasets, accuracy provides the number of correct predictions from the total number of samples. Recall and precision are typically observed together, where recall is used to identify which comments were correctly predicted for each class, and precision identifies the percentage of predicted comments that actually belonged to their respective class. Instead of using recall and precision individually, we use F -score, the harmonic mean of recall and precision, which provides a more appropriate measure of the incorrectly classified cases in an unbalanced setting. More specifically, we use a macro F -score which is a preferable metric when working with an imbalanced class distribution because it applies the same weight to each class, regardless of the number of instances in each class. By considering both accuracy and F -score, we are able to provide a more rounded assessment of the overall efficacy of the models.

Modified Dataset Creation

The goal of this research is to investigate and create a curated version of the YouTube dataset, with the purpose of being able to better understand and identify instances of bullying and non-bullying comments in a large varied dataset by investigating those instances where the consensus filtering models are themselves in unanimous agreement with the majority decision of the annotators’ labels, referred to as unambiguous instances. To do this, we implemented two filtering algorithms, which we refer to as Single-Algorithm Consensus Filtering (SACF) and Multi-Algorithm Consensus Filtering (MACF). A consensus refers to a unanimous labeling result following the application of a filtering method to a given comment.

For both CF methods, the dataset was divided into subsets of unique comments, with one subset reserved for testing and the remaining subsets used for training. The SACF method, which uses a single algorithm (SVM), has 10 subsets where one is reserved for testing and eight of the remaining nine are rotated for training a model because the training process requires eight subsets to retain an 80–20 train-test split. As a result, one subset is ignored during evaluation. To ensure that the test set is properly evaluated against every possible model, we used cross-validation and trained the subsets through nine iterations, rotating out a different subset to ignore in each iteration, resulting in nine predictions per comment. Following the repeated evaluations on each test set, we analyzed the classification generated by the model for all the comments in that test set. If the consensus after nine evaluations matches the MTurk majority annotation for a comment, we reserve that comment to be used for future analysis.

The MACF method relies on three algorithms: NB, SVM, and CNN. Instead of 10 subsets, there are five subsets where one is reserved for testing and the remaining four are used for training a model. Unlike the SACF approach, there is no rotation of subsets in the training set since the four remaining subsets meet the 80–20 train-test split requirements and we use the consensus label of the three algorithms (NB, SVM, and CNN) for evaluation. If the predicted label of all three algorithms match the MTurk label, then the comment is reserved for future analysis. This approach, which is significantly different from the SACF approach, allows for the possibility to filter comments in a manner that the single algorithm may not as it does not rely on the decision of a single algorithm to identify correctly labeled comments.

Resulting Modified Datasets

A modified version of the dataset was created by each of the filtering methods based on their results. We refer to these new versions as SA-CDS and MA-CDS (where CDS represents the terminology C urated D ata S et). These datasets were constructed by identifying comments where the filtering methods unanimously agreed with the MTurk label. The two modified versions of the dataset were assembled based on the results from the consensus filtering methods [Single-Algorithm Consensus Filtering (SACF), Multi-Algorithm Consensus Filtering (MACF)] and the MTurk verification process as follows: (1) a version that contains the comments resulting from the consensus agreement between the MTurk label and the SACF method (SA-CDS) and (2) a version that contains the results from the consensus agreement between the MTurk label and the MACF method (MA-CDS).

Modified Dataset Evaluation

An artificial neural network (ANN) was implemented to test the performance on all datasets. The ANN algorithm was used so as to remove possible algorithmic bias that would occur from using one of the algorithms included in the SA or MA consensus filtering methods. In order to conduct a fair experiment, all datasets were split with an 80–20 split for a training set and test set, respectively. Each filtering method produced a dataset with different total numbers of comments, and the bullying to non-bullying ratio was different for each dataset within the training and test sets due to the uniqueness of the filtering methods’ annotation processes.

Consensus Filtering Label Agreement Results

We hypothesized that the filtering methods, in many instances, will unanimously agree with the majority decision of the MTurk labels, which we refer to as unambiguous instances of non-bullying or bullying. With the SACF method, a unanimous agreement with the MTurk label is described as the case where the algorithm predicted the same label for a comment for all nine iterations and also matched the MTurk label. A unanimous agreement for the MACF method is described as the case where each of the three algorithms predicted the same label as that of the MTurk workers’ label. In our analysis, we found that from the 18,735 comments in the YouTube dataset, the SACF method produced labels that unanimously agreed with the MTurk label in 9489 comments and the MACF method produced labels that unanimously agreed with the MTurk label in 9679 comments. We also investigated when the filtering methods produce a predictive label that unanimously disagreed with the MTurk label, although these results are not used as part of the modified datasets. Similar to the unanimous result analysis, the number of instances of unanimous disagreement for each filtering method were roughly equivalent, with more occurrences of bullying than non-bullying. The SACF unanimously disagreed with the MTurk labels of 3365 comments, of which 1060 were originally instances of non-bullying comments and 2305 were originally instances of bullying comments. The MACF method unanimously disagreed with the MTurk labels of 3377 comments, where 1175 originally belonged to the non-bullying class and 2202 originally belonged to the bullying class.

Given that the goal of both filtering methods is to identify unambiguous comments, we analyzed the number of comments that were commonly identified by both filtering methods as correctly labeled. In total, there were 8017 comments in common where both filtering methods unanimously agreed with the MTurk label (6722 bullying and 1295 non-bullying) and 2324 comments in common where both filtering methods unanimously disagreed with the MTurk label (1689 bullying and 635 non-bullying). Further analysis into the unanimously agreed 8017 comments showed that the MTurk workers themselves completely agreed on the label on 4203 of these comments, with 564 belonging to the bullying class and 3639 belonging to the non-bullying class. Similarly, we investigated the 7578 comments that were removed during the filtering process. Of these comments, the MTurk workers all agreed on the same label for 2340 of the comments, with 1128 identified as bullying and 1212 identified as non-bullying.

Modified Dataset Descriptions and Distributions

The overarching goal of this study is to explore those instances of bullying and non-bullying comments identified as unambiguous using the consensus filtering methods and the MTurk labels, and to create a revised version of the dataset which can be subsequently used to develop a machine learning model that can more accurately predict instances of cyberbullying or non-bullying. As described in the “ Methods ” section, two modified datasets were curated. We refer to these two datasets as SA-CDS and MA-CDS , respectively. With these new datasets, we also display MTurk labeling information from the original YouTube dataset, simply termed YouTube . Given that each filtering method is different in its implementation, we hypothesized that each approach would produce different-sized datasets with a different ratio of bullying and non-bullying comments. The original dataset contained 18,735 comments with 6462 labeled as bullying and 12,273 labeled as non-bullying. The curated datasets that were formed from the filtering methods, SA-CDS and MA-CDS, contained similar numbers of comments, with SA-CDS containing 9489 comments and MA-CDS containing 9679 comments (see Table 1 ). The bullying to non-bullying ratio was approximately equal with SA-CDS containing 1721 bullying comments and 7768 non-bullying comments (~ 1:4.5 ratio), and the MA-CDS containing 1835 bullying comments and 7844 (~ 1:4.3 ratio) non-bullying comments (see Table 1 ). While the bullying to non-bullying ratios of these modified datasets are notably smaller than the ~ 1:2.9 bullying to non-bullying ratio of the YouTube dataset, they still adhere to the description of a good dataset that has at least 10% to 20% bullying instances (Salawu et al., 2017 ).

Artificial Neural Network Evaluation on Modified Datasets

An ANN was used to test for classification performance on all datasets, using accuracy and F -score as performance metrics (see Table 2 ). To establish a baseline, we measured the performance of the ANN on the YouTube dataset and found that it had a classification accuracy of 67% and an F -score of 63%. We then used the ANN to measure the classification performance on the modified datasets. The ANN performed similarly on the SA-CDS and MA-CDS datasets with an accuracy of 96% on both and an F -score of 93% and 92%, respectively. The two modified datasets demonstrated a strong increase in performance compared to the original YouTube dataset (see Table 2 and Fig.  1 ), with a 28% increase in accuracy and a range of 28%–30% increase in F -score.

figure 1

Accuracy and F -score for all datasets using an artificial neural network (ANN)

The objectives of this research were twofold: (1) to apply a collaborative approach of human labeling with consensus filtering methods to refine a cyberbullying dataset with unambiguous instances of cyberbullying and bullying comments and (2) to investigate whether an independent machine learning model is more performant on the curated datasets. A curated dataset based on unambiguous instances of cyberbullying can be used to develop more performant cyberbullying detection models, which could be subsequently used to initially distinguish between clear instances of cyberbullying and those cases that may require further analysis or more specific language models. The filtering methods, SACF and MACF, identified roughly the same percentage of both bullying and non-bullying comments, resulting in datasets that when tested with a completely independent algorithm, produced similar classification performance with high accuracy. This suggests that both filtering methods can be used to curate datasets that can be used to develop more performant models. When viewing the commonality of the filtering methods, both approaches unanimously agreed with the MTurk label on 8017 comments (83% of the total comments identified by SACF and 84% of the total comments identified by MACF). Given the differences in the filtering method algorithms and implementations, the high percentage of commonality indicates that both approaches can be used with confidence to curate datasets that can be used to create more performant models. The SACF method relies on one algorithm, SVM, and this may be sufficient to identify unambiguous instances. However, the unique approach of MACF is that each algorithm used has a distinct implementation from the others, which has the potential to mimic how individuals from different backgrounds could view a given comment with dissimilar perspectives. In this approach, if the algorithms’ predictions are in unanimous agreement on a comment’s label, then it may be a strong indication that the annotator label is correct.

Our analysis also found that of the 8017 comments where both filtering methods agreed on the label, a little over half (52%) of these comments had labels where all three MTurk workers were also in complete agreement on the annotation. In contrast, of the 7578 comments removed during the filtering process, there were only 2340 comments (31%) where the MTurk workers completely agreed on the designated annotation. This shows that in cases where both filtering methods identified a comment as unambiguous, there is a higher likelihood that those comments also had complete agreement among the MTurk workers’ annotations. This has implications for further research into models developed on an even more refined version of the curated datasets containing only those comments that are identified as unambiguous by the filtering methods and where the MTurk workers’ annotations are in complete agreement.

When using the ANN to develop a model on the modified datasets, we found that it had a slightly higher increase in performance on the SA-CDS. While the accuracy of the ANN on SA-CDS and MA-CDS was 96%, the test sets used to attain this metric from the datasets were imbalanced; therefore, the F -score is a better measure (see Table 2 and Fig.  1 ). The F -score of the ANN on SA-CDS was 93%, whereas the F -score from MA-CDS was only slightly lower than that at 92% (see Table 2 and Fig.  1 ). This similarity in performance is unsurprising given that about 83% of each dataset has comments in common with the other, suggesting that their differences are negligible. It may be possible that using a combination of different algorithms with the MACF approach may produce superior results to the ones described in this study. For example, a combination of neural network–based algorithms may be preferable, such as a word embedding CNN (identical to the one used in this work), a character embedding CNN, and a recurrent neural network (Minaee et al., 2020 ), which have all shown success in text classification. Another option could be to use an ANN, similar to the one presented in this study, but with different word representation methods (i.e., BoW, character embeddings, word embeddings, etc.) and different lengths in word sequences (i.e., different n-grams). Using different word representations provides diverse perspectives that can help in identifying bullying content from different viewpoints, which, in the case of unanimous agreement, indicates confidence in that label.

Although the goal of this research is to refine a cyberbullying dataset with unambiguous instances of bullying and non-bullying comments and to filter out those comments with potentially uncertain labels, there were instances of comments where the filtering methods unanimously disagreed with the MTurk label. Interestingly, the number of comments where the filtering methods unanimously disagreed with MTurk labels was similar (SACF: 3365, MACF: 3377), even when considering the ratio of bullying to non-bullying comments (SACF: ~ 2.17:1, MACF: ~ 1.87:1). However, unlike the case with unanimous agreement, when viewing the commonality of the filtering methods, both approaches unanimously disagreed with the MTurk label on 2322 comments. This is approximately 69% of the comments that unanimously disagreed with the MTurk labels identified by each filtering approach separately. This class of comments are worth discussing because if the proposed filtering methods are to rely on consensus as a way of identifying unambiguous instances of bullying and non-bullying comments, then this set of identified comments have consistently disputed the MTurk label and further investigation is needed to more fully understand what is unique about this sub-dataset and what properties cause it to be consistently mislabeled in both filtering approaches. One possibility is that this subset, or some portion of this subset, has been incorrectly labeled by MTurk and the filtering methods have predicted the correct labels. Additionally unusual is that the majority of these comments did not belong to the non-bullying class, which is the dominant class, but rather to the bullying class.

Limitations

A limitation of this study is that while an ANN was used instead of one of the algorithms from the filtering methods to remove algorithmic bias that may occur due to using an algorithm that also created the dataset, an algorithm that is uniquely different could have been used to further distance the implementation of the performance algorithm from those of the filtering algorithms. The ANN utilized a BoW approach to represent words, which was also used with naive Bayes and SVM during the filtering process. An n-gram approach with the ANN could have been used with word embeddings alone, thus incorporating an implementation that is not as directly related to that used in the filtering algorithms. As an alternative, an algorithm such as random forest, which uses majority voting to decide on a classification, could also be used in place of the ANN given that its implementation is substantially different from NB and SVM.

A second limitation of this study centers around the size of the dataset. While the content of the dataset is relatively current (late 2019), the size of the dataset was limited (approximately 19,000 comments) and machine learning model performance is dependent on the size of the dataset used to train the model. A larger dataset has the benefit of including a more diverse vocabulary compared to the one developed through the dataset in this study, especially when using a BoW text representation. Continuously expanding the vocabulary to encompass relevant cultural and societal terminology is essential to address the evolving character of cyberbullying, and could produce models with greater relabeling accuracy compared to the models developed in this research.

A third limitation is that the instances where the filtering methods unanimously disagreed with the MTurk label may require further analysis. We reported and briefly discussed those results in this study, but we did not do any testing against this subset which could provide further insight into what these instances represent, and whether a majority may simply be comments that are incorrectly labeled.

A fourth limitation of this study focuses on the process. At this point, our strategy relies on a comparison of the algorithmically agreed results to the original human consensus annotations. While this reduces the size of the generated dataset and makes it highly sensitive to the original group of annotators, this allows for a deeper understanding and analysis of the type of original data that is consistently classified as bullying or non-bullying. As we improve our understanding of cyberbullying datasets and classification outcomes, a future goal is to remove this dependency on the original annotations while maintaining model accuracy.

Finally, the MTurk label used was based on the majority vote of the three MTurk workers, where if at least two workers labeled a comment as bullying then the final label is bullying. Our filtering methods depend on a unanimous agreement among the iterations of the single algorithm or the three algorithms in the multi-algorithmic approach. Different results may have been produced if we only considered instances where there is a unanimous agreement between the algorithms, as well as between all three workers. This is something that should be investigated further, because it may filter out additional uncertainty, which will result in a superior dataset for unambiguous cyberbullying detection.

Future Work and Implications

We have shown that using machine learning algorithms, as part of single or multiple filtering approaches, to evaluate a YouTube dataset allowed us to (1) curate modified versions of the dataset with a focus on bullying and non-bullying comments identified as unambiguous while still adhering to the definition of a good cyberbullying dataset, and to (2) create more performant classification models from those datasets, while also gaining insight into the type of data that is consistently classified as bullying or non-bullying. Datasets used to detect cyberbullying using machine learning can contain uncertain data, and this process of creating modified datasets using filtering methods can prove useful as an initial attempt at separating those data that are clear cases of bullying and non-bullying from those that are uncertain and may require further context or expert analysis as part of the identification process.

Given that most online interactions do not occur in a vacuum, a possible enhancement in cyberbullying detection is to incorporate all the elements and context of a comment in a dataset. Does the comment include an image or is embedded on it? How do emojis and emoticons impact the direction of the comment? Are there any slang words that could be interpreted in a different way? All of these, individually or in combination, could help improve the accuracy of our algorithms and also limit biases from MTurk or any other human curators. Lastly, we could look at creating hybrid strategies that combine supervised and unsupervised learning methods (Dinakar et al., 2012 ; Trana et al., 2020 ) that would allow for the creation of feedback loops and more adaptability of our algorithms without having to retrain them as the datasets grow and evolve.

Another possible application of these strategies is to detect clear cases of cyberbullying in real time. Currently, social media and other online outlets use proprietary machine learning algorithms to flag potentially offensive comments as detailed in their Terms of Service. On some platforms, such as Twitter, users can implement settings where they can review all flagged content first, or choose to have it automatically blocked. This two-step process is similar to what we presented and has the potential to be improved by increasing the accuracy of the methods used to select apparent occurrences of cyberbullying. In a similar manner, we can adapt our strategies to detect accurately labeled comments in domains like politics, science, social issues, and others. The results of this research study suggest an algorithmic framework to formally analyze and initially assess cyberbullying datasets. While human participants are still needed to provide a foundation for annotation, the use of multiple algorithms provides a scaffolding structure that could eventually incorporate unsupervised models that have been trained to recognize cultural colloquialisms and contemporary slang terminology, as well as context, thus addressing the inherent subjectivity of using human annotators. Additionally, the ability to make use of algorithms to dynamically recognize and identify new harmful or malicious content can further reduce the financial obligation required for recruiting human participants to create large-scale comprehensive datasets, thus creating new pathways and opportunities for research on preventing cyberbullying, with an ultimate goal of creating safer online spaces. It is important to note that the goals of these strategies are not to completely replace human decision-making and outperform experts, or to use AI-based methods to police online domains, but rather to help develop clear definitions surrounding harmful commentary and to help recognize human error and bias in data.

Availability of Data and Material

The original YouTube dataset and the combined algorithmic/Amazon Mechanical Turk curated dataset are available upon request.

Code Availability

Code is available by request for reuse and modification as long as the original authors are referenced and the code is not used commercially.

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation,  16, 265–283.

Ahler, D. J., Roush, C. E., & Sood, G. (2019). The micro-task market for lemons: Data quality on Amazon’s Mechanical Turk. In Meeting of the Midwest Political Science Association .

Allison, K. R., & Bussey, K. (2016). Cyber-bystanding in context: A review of the literature on witnesses’ responses to cyberbullying. Children and Youth Services Review, 65 , 183–194.

Article   Google Scholar  

Baldasare, A., Bauman, S., Goldman, L., & Robie, A. (2012). Cyberbullying? Voices of college students. In Misbehavior online in higher education . Emerald Group Publishing Limited.

Bayzick, J., Kontostathis, A., & Edwards, L. (2011). Detecting the presence of cyberbullying using computer software.

Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11 , 131–167.

Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., & Vakali, A. (2017). Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science conference , 13–22.

Cyberbullying (n.d.). In Merriam-Webster’s online dictionary. Available at: http://www.merriam-webster.com/dictionary/cyberbullying . Accessed May 19, 2021.

Dadvar, M., Jong, F. D., Ordelman, R., & Trieschnigg, D. (2012). Improved cyberbullying detection using gender information. In Proceedings of the Twelfth Dutch-Belgian Information Retrieval Workshop (DIR 2012). University of Ghent.

Dadvar, M., Trieschnigg, D., Ordelman, R., & de Jong, F. (2013). Improving cyberbullying detection with user context. European Conference on Information Retrieval (pp. 693–606). Springer.

Google Scholar  

D’Cruz, P., & Noronha, E. (2018). Abuse on online labour markets: Targets’ coping, power and control. Qualitative Research in Organizations and Management, 13 (1), 53–78. https://doi.org/10.1108/QROM-10-2016-1426

Dinakar, K., Jones, B., Havasi, C., Lieberman, H., & Picard, R. (2012). Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS), 2 (3), 1–30.

Dredge, R., Gleeson, J., & de la Piedad Garcia, X. (2014). Cyberbullying in social networking sites: An adolescent victim’s perspective. Computers in Human Behavior, 36 , 13–20. https://doi.org/10.1016/j.chb.2014.03.026

Hackett, L., Verjee, L., Jones, S., Bauman, S., Smith, R., Everett, H. (2019) Ditch the label: The annual bullying survey (2019). Resource Document. https://www.ditchthelabel.org/wp-content/uploads/2019/11/The-Annual-Bullying-Survey-2019-1.pdf . Accessed October 12, 2020.

Ekambaram, R., Goldgof, D. B., & Hall, L. O. (2017). Finding label noise examples in large scale datasets. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE . 2420–2424.

Englander, E., Donnerstein, E., Kowalski, R., Lin, C. A., & Parti, K. (2017). Defining cyberbullying. Pediatrics, 140 (Supplement 2), S148–S151.

Garbe, W. (2020). SymSpell. Github. https://github.com/wolfgarbe/SymSpell . Accessed December 5, 2019.

Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media.

Giumetti, G. W., & Kowalski, R. M. (2016). Cyberbullying matters: Examining the incremental impact of cyberbullying on outcomes over and above traditional bullying in North America. In Cyberbullying across the globe,  117–130. Springer, Cham.

Gordon, S. (2020). Research shows rise in cyberbullying during COVID-19 pandemic. Verywell Family. https://www.verywellfamily.com/cyberbullying-increasing-during-global-pandemic-4845901 . Accessed September 25, 2020.

Guan, D., Yuan, W., Lee, Y. K., & Lee, S. (2011). Identifying mislabeled training data with the aid of unlabeled data. Applied Intelligence, 35 (3), 345–358.

Hinduja, S., & Patchin, J. W. (2015). Bullying beyond the schoolyard: Preventing and responding to cyberbullying. Corwin Press.

Hinduja, S., & Patchin, J. W. (2019a). Connecting adolescent suicide to the severity of bullying and cyberbullying. Journal of School Violence, 18 (3), 333–346.

Hinduja, S., & Patchin, J. W. (2019b). Cyberbullying fact sheet: identification, prevention, and response. Cyberbullying Research Center. https://cyberbullying.org/Cyberbullying-Identification-Prevention-Response-2019.pdf . Accessed January 10, 2020.

Höher, J., Scheithauer, H., & Schultze-Krumbholz, A. (2014). How do adolescents in Germany define cyberbullying? A focus-group study of adolescents from a German major city. Praxis Der Kinderpsychologie Und Kinderpsychiatrie, 63 (5), 361–378.

Hosseinmardi, H., Mattson, S. A., Rafiq, R. I., Han, R., Lv, Q., & Mishra, S. (2015). Detection of cyberbullying incidents on the instagram social network. arXiv preprint arXiv:1503.03909

Kennedy, R., Clifford, S., Burleigh, T., Waggoner, P. D., Jewell, R., & Winter, N. J. (2020). The shape of and solutions to the MTurk quality crisis. Political Science Research and Methods, 8 (4), 614–629.

Kessel Schneider, S., O’Donnell, L., & Smith, E. (2015). Trends in cyberbullying and school bullying victimization in a regional census of high school students, 2006–2012. Journal of School Health, 85 (9), 611–620.

Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10 (4), 150.

Langos, C. (2012). Cyberbullying: The challenge to define. Cyberpsychology, Behavior, and Social Networking, 15 (6), 285–289.

Lin, C. H., Mausam, M., & Weld, D. S. (2014). To re (label), or not to re (label). In HCOMP .

Menesini, E., Nocentini, A., Palladino, B. E., Frisén, A., Berne, S., Ortega-Ruiz, R., Calmaestra, J., Scheithauer, H., Schultze-Krumbholz, A., Luik, P., Naruskov, K., Blaya, C., Berthaud, J., & Smith, P. K. (2012). Cyberbullying definition among adolescents: A comparison across six European countries. Cyberpsychology, Behavior, and Social Networking, 15 (9), 455–463.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems , 3111–3119.

Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2020). Deep learning based text classification: a comprehensive review .

Müller, N. M., & Markert, K. (2019). Identifying mislabeled instances in classification datasets. In 2019 International Joint Conference on Neural Networks (IJCNN) IEEE,  1–8.

Nandhini, B. S., & Sheeba, J. I. (2015). Cyberbullying detection and classification using information retrieval algorithm. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology , 1–5.

Nixon, C. L. (2014). Current perspectives: The impact of cyberbullying on adolescent health. Adolescent Health, Medicine and Therapeutics, 5 , 143.

Nocentini, A., Calmaestra, J., Schultze-Krumbholz, A., Scheithauer, H., Ortega, R., & Menesini, E. (2010). Cyberbullying: Labels, behaviours and definition in three European countries. Australian Journal of Guidance and Counselling, 20 (2), 129.

Our range of enforcement options. (2020). Twitter. https://help.twitter.com/en/rules-and-policies/enforcement-options . Accessed September 25, 2020.

Patchin, J. W., & Hinduja, S. (2019). Summary of our cyberbullying research (2007–2019). Retrieved from Cyberbullying Research Center website: https://cyberbullying.org/summary-of-our-cyberbullying-research . Accessed September 25, 2020.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12 , 2825–2830.

Peter, I. K., & Petermann, F. (2018). Cyberbullying: A concept analysis of defining attributes and additional influencing factors. Computers in Human Behavior, 86 , 350–366.

Ptaszyński, M., Leliwa, G., Piech, M., & Smywiński-Pohl, A. (2018). Cyberbullying detection–technical report 2/2018, Department of Computer Science AGH, University of Science and Technology. arXiv preprint arXiv:1808.00926.

Reynolds, K., Kontostathis, A., & Edwards, L. (2011). Using machine learning to detect cyberbullying. In 2011 10th International Conference on Machine Learning and Applications and Workshops IEEE, 2 , 241–244.

Rosa, H., Matos, D., Ribeiro, R., Coheur, L., & Carvalho, J. P. (2018). A “deeper” look at detecting cyberbullying in social networks. In 2018 International Joint Conference on Neural Networks (IJCNN) IEEE,  1–8.

Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P. C., Carvalho, J. P., Oliveira, S., Coheur, L., Paulino, P., Simão, A. M., & Trancoso, I. (2019). Automatic cyberbullying detection: A systematic review. Computers in Human Behavior, 93 , 333–345.

Salawu, S., He, Y., Lumsden, J. (2017). Approaches to automated detection of cyberbullying: A survey. IEEE Transactions on Affective Computing.

Samami, M., Akbari, E., Abdar, M., Plawiak, P., Nematzadeh, H., Basiri, M. E., & Makarenkov, V. (2020). A mixed solution-based high agreement filtering method for class noise detection in binary classification. Physica A: Statistical Mechanics and its Applications , 124219.

Smith P. K, del Barrio, C., & Tokunaga, R. (2013). Definitions of bullying and cyberbullying: How useful are the terms? In S, Bauman, D, Cross, & J, Walker (Eds) Principles of Cyberbullying Research: Definition, Measures, and Methods,  pp. 29–40. Philadelphia, PA: Routledge.

Steinmetz, K. (2019). Inside Instagram’s war on bullying. Time. https://time.com/5619999/instagram-mosseri-bullying-artificial-intelligence/ . Accessed September 25, 2020.

Sugandhi, R., Pande, A., Agrawal, A., & Bhagat, H. (2016). Automatic monitoring and prevention of cyberbullying. International Journal of Computer Applications, 8 , 17–19.

Suler, J. (2004). The online disinhibition effect. Cyberpsychology & Behavior: THe Impact of the Internet, Multimedia and Virtual Reality on Behavior and Society., 7 , 321–326.

Trana, R. E., Gomez, C. E., & Adler, R. F. (2020). Fighting cyberbullying: An analysis of algorithms used to detect harassing text found on YouTube. International Conference on Applied Human Factors and Ergonomics (pp. 9–15). Springer.

Vaillancourt, T., Faris, R., & Mishna, F. (2017). Cyberbullying in children and youth: Implications for health and clinical practice. The Canadian Journal of Psychiatry, 62 (6), 368–373.

Vandebosch, H., & Van Cleemput, K. (2008). Defining cyberbullying: A qualitative research into the perceptions of youngsters. Cyberpsychology & Behavior : THe Impact of the Internet, Multimedia and Virtual Reality on Behavior and Society, 11 (4), 499–503.

Van Hee, C., Lefever, E., Verhoeven, B., Mennes, J., Desmet, B., De Pauw, G., & Hoste, V. (2015). Detection and fine-grained classification of cyberbullying events. In International Conference Recent Advances in Natural Language Processing , 672–680.

Vranjes, I., Baillien, E., Vandebosch, H., Erreygers, S., & De Witte, H. (2017). The dark side of working online: Towards a definition and an emotion reaction model of workplace cyberbullying. Computers in Human Behavior, 69, 324–334.

Wais, P., Lingamneni, S., Cook, D., Fennell, J., Goldenberg, B., Lubarov, D., & Simons, H. (2010). Towards building a high-quality workforce with Mechanical Turk. In Proceedings of Computational Social Science and the Wisdom of Crowds (NIPS), 1–5.

Walker, C. M. (2014). Cyberbullying redefined: An analysis of intent and repetition. International Journal of Education and Social Science, 1 (5), 59–69.

Word2Vec. (2013). Google Code . Document Resource. https://code.google.com/archive/p/word2vec/ . Accessed December 5, 2019.

Download references

Acknowledgements

This work was supported by the U.S. Department of Education Title III Award #P031C160209 and the Northeastern Illinois University COR grant (2019–2020). We would also like to thank Dr. Rachel Adler, Amanda Bowers, Akshit Gupta, Sebin Puthenthara Suresh, Luis Rosales, Joanna Vaklin, and Ishita Verma for their participation in this research project.

This work was supported by the U.S. Department of Education Title III Award #P031C160209 and the Northeastern Illinois University COR grant (2019–2020).

Author information

Authors and affiliations.

Department of Computer Science, Northeastern Illinois University, 5500 N St. Louis Ave, Chicago, IL, 60625, USA

Christopher E. Gomez, Marcelo O. Sztainberg & Rachel E. Trana

You can also search for this author in PubMed   Google Scholar

Contributions

All of the authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Christopher Gomez. The first draft of the manuscript was written by Christopher Gomez, and all of the authors commented on subsequent versions of the manuscript. All of the authors read and approved the final manuscript.

Corresponding author

Correspondence to Rachel E. Trana .

Ethics declarations

Ethics approval.

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the Helsinki Declaration of 1975 and its later amendments or comparable ethical standards. The study was approved by the Ethics Committee of Northeastern Illinois University (No. 19–060).

Consent to Participate

Informed consent was obtained from all individual participants included in the study (No. 19–060).

Consent for Publication

Not applicable.

Conflict of Interest

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Gomez, C.E., Sztainberg, M.O. & Trana, R.E. Curating Cyberbullying Datasets: a Human-AI Collaborative Approach. Int Journal of Bullying Prevention 4 , 35–46 (2022). https://doi.org/10.1007/s42380-021-00114-6

Download citation

Accepted : 03 December 2021

Published : 22 December 2021

Issue Date : March 2022

DOI : https://doi.org/10.1007/s42380-021-00114-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Cyberbullying
  • Data annotation
  • Supervised learning
  • Consensus filtering
  • Find a journal
  • Publish with us
  • Track your research

Got any suggestions?

We want to hear from you! Send us a message and help improve Slidesgo

Top searches

Trending searches

data presentation about cyberbullying

15 templates

data presentation about cyberbullying

pink flowers

255 templates

data presentation about cyberbullying

62 templates

data presentation about cyberbullying

11 templates

data presentation about cyberbullying

49 templates

data presentation about cyberbullying

student council

Bullying presentation templates, bullying consists of repeated intentional behaviors that cause physical or emotional harm to a victim. it is, therefore, a serious problem that should be treated with the seriousness that the subject deserves. thus, we have selected those google slides themes and powerpoint templates that address this topic for you to have at your disposal if you need to make a presentation on the subject. choose the template that best suits your content, edit, and participate in raising awareness about bullying to make the world a nice place for everyone..

  • Calendar & Weather
  • Infographics
  • Marketing Plan
  • Project Proposal
  • Social Media
  • Thesis Defense
  • Black & White
  • Craft & Notebook
  • Floral & Plants
  • Illustration
  • Interactive & Animated
  • Professional
  • Instagram Post
  • Instagram Stories

World Day of Bullying Prevention presentation template

It seems that you like this template!

Premium template.

Unlock this template and gain unlimited access

data presentation about cyberbullying

Register for free and start downloading now

World day of bullying prevention.

Download the "World Day of Bullying Prevention" presentation for PowerPoint or Google Slides. The education sector constantly demands dynamic and effective ways to present information. This template is created with that very purpose in mind. Offering the best resources, it allows educators or students to efficiently manage their presentations and...

Criminal Law Master's Thesis: Should There Be a Law Preventing Cyber-bullying? presentation template

Criminal Law Master's Thesis: Should There Be a Law Preventing Cyber-bullying?

Download the Criminal Law Master's Thesis: Should There Be a Law Preventing Cyber-bullying? presentation for PowerPoint or Google Slides. Congratulations, you have finally finished your research and made it to the end of your thesis! But now comes the big moment: the thesis defense. You want to make sure you...

Cyber Bullying Thesis Defense presentation template

Cyber Bullying Thesis Defense

Now that the center of children’s social lives has shifted to the internet, it’s very important to raise awareness on cyberbullying. This template is the perfect resource to give a lesson on how to identify and stop it and it has a innovative design that will catch your student’s attention....

International Day against Bullying at School including Cyberbullying presentation template

International Day against Bullying at School including Cyberbullying

The first Thursday of November holds the International Day against Bullying at School Including Cyberbullying, and we think it's a great idea because every kid deserves a good education and well-being and because... violence is bad, period! This blue-colored template is ideal for presentations on this topic. The illustrations, although...

Bullying Infographics presentation template

Bullying Infographics

One of the main problemas that schools need (must!) tackle is bullying. Otherwise, children might develop insecurities and that can turn into a traumatic experience or something that they have to deal with for the rest of their lives. In order to make your audience realize the importance of this...

World Day of Bullying Prevention Activities presentation template

World Day of Bullying Prevention Activities

May 2 is a very important day in schools and all kinds of education centers. On this day we celebrate the World Day of Bullying Prevention, an observation dedicated to raise the awareness on the importance of identifying bullying and stopping it. Have you ever experienced or witnessed bullying? These...

World Day of Bullying Prevention presentation template

Anti-Bullying Campaign: Stop the Hate!

Advocating for a better and kinder world starts with teaching children about tolerance and respect. Prepare an anti-bullying campaign and spread an inspiring message to your students. This template is full of children’s illustrations and it’ll convey the anti-bullying message clearly. Edit the slides and spread awareness to help create...

Cyber Bullying Infographics presentation template

Cyber Bullying Infographics

The internet is becoming an important part of our social lives, but unfortunately that is the new way bullies are doing their bad deeds. If you are giving an informative speech on cyberbullying and you need to present data in a visual way, take a look at our editable infographics....

Practical Life Subject for Middle School - 6th Grade: Bullying Prevention presentation template

Practical Life Subject for Middle School - 6th Grade: Bullying Prevention

One of the situations that require the most immediate action at school is a case of bullying. It breaks everyone's heart to see a child being rejected by their classmates. To prevent this serious problem, it is important to make it clear to students that bullying is unacceptable and to...

World Day of Bullying Prevention presentation template

Stop Bullying!

When it comes to bullying, there’s more people involved than just the person who perpetrates it. Specially in schools, teachers are the adults responsible of watching for harmful conducts and stopping them. But in order to take on this role, teachers must be aware of all the shapes that bullying...

Cyber School Bullying Prevention Campaign presentation template

Cyber School Bullying Prevention Campaign

Download the Cyber School Bullying Prevention Campaign presentation for PowerPoint or Google Slides. The education sector constantly demands dynamic and effective ways to present information. This template is created with that very purpose in mind. Offering the best resources, it allows educators or students to efficiently manage their presentations and...

School Bullying Prevention Campaign presentation template

School Bullying Prevention Campaign

Download the School Bullying Prevention Campaign presentation for PowerPoint or Google Slides. The education sector constantly demands dynamic and effective ways to present information. This template is created with that very purpose in mind. Offering the best resources, it allows educators or students to efficiently manage their presentations and engage...

Identifying a Victim of School Bullying presentation template

Identifying a Victim of School Bullying

Download the Identifying a Victim of School Bullying presentation for PowerPoint or Google Slides. The education sector constantly demands dynamic and effective ways to present information. This template is created with that very purpose in mind. Offering the best resources, it allows educators or students to efficiently manage their presentations...

Action Guide Against School Bullying presentation template

Action Guide Against School Bullying

Download the Action Guide Against School Bullying presentation for PowerPoint or Google Slides. The education sector constantly demands dynamic and effective ways to present information. This template is created with that very purpose in mind. Offering the best resources, it allows educators or students to efficiently manage their presentations and...

World Day of Bullying Prevention Activities Infographics presentation template

World Day of Bullying Prevention Activities Infographics

World Day of Bullying Prevention is a great initiative to raise awareness about this problem and how to prevent it. These activities infographics template can help you put together something amazing for this special day! With colorful illustrations to make the subject more approachable, there are slides you can customize...

World Day of Bullying Prevention presentation template

Create your presentation Create personalized presentation content

Writing tone, number of slides.

  • Page 1 of 2

Register for free and start editing online

Cyberbullying_LOGO

2016 Cyberbullying Data

This study surveyed a nationally-representative sample of 5,700 middle and high school students between the ages of 12 and 17 in the United States. Data were collected between July and October of 2016. Click on the thumbnail images to enlarge.

Teen Technology Use - 2016

Teen Technology Use. Cell phones and other mobile devices continue to be the most popular technology utilized by adolescents with the top four reported weekly activities involving their use. Facebook remains the most frequently cited social media platform used on a weekly basis, but Instagram and Snapchat are increasing in popularity. Chat rooms, Tumblr, and Ask.fm remain largely unpopular among this age group.

Cyberbullying Victimization. We define cyberbullying as: “ Cyberbullying is when someone repeatedly and intentionally harasses, mistreats, or makes fun of another person online or while using cell phones or other electronic devices. ” Approximately 34% of the students in our sample report experiencing cyberbullying in their lifetimes. When asked about specific types of cyberbullying experienced in the previous 30 days, mean or hurtful comments (22.5%) and rumors spread (20.1%) online continue to be among the most commonly-cited. Twenty-six percent of the sample reported being cyberbullied in one or more of the eleven specific types reported, two or more times over the course of the previous 30 days.

Cyberbullying Offending.  We define cyberbullying as: “ Cyberbullying is when someone repeatedly and intentionally harasses, mistreats, or makes fun of another person online or while using cell phones or other electronic devices. ” Approximately 12% of the students in our sample admitted to cyberbullying others at some point in their lifetime. Posting mean comments online was the most commonly reported type of cyberbullying they reported during the previous 30 days (7.1%). About 8% of the sample reported cyberbullying using one or more of the eleven types reported, two or more times over the course of the previous 30 days.

Cyberbullying by Gender. Adolescent girls are significantly more likely to have experienced cyberbullying in their lifetimes (36.7% vs. 30.5%). This gap narrows when reviewing experiences over the previous 30 days. In this sample, boys were significantly more likely to report cyberbullying others during their lifetime (12.7% vs. 10.2%) and in the most recent 30 days (7.7% vs. 4.4%). The type of cyberbullying tends to differ by gender; girls were more likely to say someone spread rumors about them online while boys were more likely to say that someone threatened to hurt them online. This was the first sample we have collected where boys reported significantly more involvement in every type of cyberbullying offending behavior we asked about (results not shown in the chart). In the past, this has varied by type of behavior.

Cyberbullying Victimization Rates by Race, Sex, and Age

Our 2016 survey involved a large enough sample of American middle and high school students that it allows us to extrapolate rates of victimization for various demographic subgroups. For this particular chart, we examined cyberbullying victimization within the last 30 days for three characteristics: race, sex, and age. Read more here: https://cyberbullying.org/cyberbullying-victimization-rates-2016

2016 Cyberbullying Data

Methodology

For this study, we contracted with three different online survey research firms to distribute our questionnaire to a nationally-representative sample of middle and high school students. We had four different versions of our survey instrument which allowed us to ask a variety of questions to subsamples of each group. All students were asked questions about experiences with bullying and cyberbullying, digital dating abuse or violence, digital self-harm, sexting, and sextortion. Overall we obtained a 13% response rate, which isn’t amazing, but is higher than most generic Internet surveys.

With any imperfect social science study, caution should be used when interpreting the results. We can be reassured somewhat in the validity in the data, however, because the prevalence rates are in line with results from our previous school-based surveys. Moreover, the large sample size helps to diminish the potential negative effects of outliers. Finally, steps were taken to ensure valid responses within the survey instrument. For example, we asked the respondents to select a specific color among a list of choices and required them to report their age at two different points in the survey, in an effort to guard against computerized responses and thoughtless clicking through the survey.

We want to thank the Digital Trust Foundation for funding this study.

Suggested citation: Patchin, J. W. & Hinduja, S. (2017). 2016 Cyberbullying Data. Cyberbullying Research Center. https://cyberbullying.org/2016-cyberbullying-data

Select publications from this data set:

Hinduja, S. & Patchin, J. W. (2022). Bullying and Cyberbullying Offending: The Influence of Six Parenting Dimensions Among US Youth. Journal of Child and Family Studies, 31 , 1454-1473.

Hinduja, S. & Patchin, J. W. (2021). Digital Dating Abuse Among a National Sample of U.S. Youth. Journal of Interpersonal Violence, 26 (23-24), 11088-11108.

Lee, C., Patchin, J. W., Hinduja, S., & Dischinger, A. (2020). Bullying and Delinquency: The Impact of Anger and Frustration. Violence and Victims, 35 (4), 503-523.

Patchin, J. W. & Hinduja, S. (2020). It is Time to Teach Safe Sexting. Journal of Adolescent Health, 66(2), 140-143.

Patchin, J. W. & Hinduja, S. (2020). Sextortion Among Adolescents: Results from a National Survey of U.S. Youth. Online First in Sexual Abuse: A Journal of Research and Treatment, 32(1), 30-54 .

Patchin, J. W. & Hinduja, S. (2019). The Nature and Extent of Sexting Among Middle and High School Students. Archives of Sexual Behavior, 48 (8), 2333-2343.

Hinduja, S. & Patchin, J. W. (2019). Connecting Adolescent Suicide to the Severity of Bullying and Cyberbullying. Journal of School Violence, 18 (3), 333-346.

Patchin, J. W. & Hinduja, S. (2018). Deterring teen bullying: Assessing the impact of perceived punishment from police, schools, and parents. Youth Violence and Juvenile Justice , 16(2), 190-207.

Patchin, J. W. & Hinduja, S. (2017). Digital self-harm among adolescents. Journal of Adolescent Health , 61, 761-766.

Hinduja, S. & Patchin, J. W. (2017). Cultivating youth resilience to prevent bullying and cyberbullying victimization. Child Abuse & Neglect , 73, 51-62.

Blog posts based on this data set:

May 13, 2024 – Digital Self-Harm: The Growing Problem You’ve Never Heard Of

October 4, 2023 – Cyberbullying Continues to Rise among Youth in the United Stat es

October 14, 2022 – Cyberbullying Among Asian American Youth Before and During the COVID-19 Pandemic

August 2, 2022 – Digital Self-Harm and Suicidality among Middle and High School Students

April 13, 2022 – Michigan Teen Latest Casualty of Sextortion

February 1, 2022 – The Role of Parents in Preventing Bullying and Cyberbullying

September 29, 2021 – Bullying During the COVID-19 Pandemic

September 16, 2020 – Bullying and Cyberbullying: The Connection to Delinquency

November 19, 2019 – Sextortion: More Insight Into the Experiences of Youth

July 17, 2019 – Youth Sexting in the US: New Paper in Archives of Sexual Behavior

May 29, 2019 – School Bullying Rates Increase by 35% from 2016 to 2019

January 7, 2019 – Cyberbullying Victimization Rates by Race, Sex, and Age

October 23, 2018 – Authoritative School Climate: The Next Step in Helping Students Thrive

October 3, 2018 – Sextortion Among Adolescents

September 20, 2018 – Are “Gamers” More Likely to be “Bullies”?

May 15, 2018 – Student Experiences with Reporting Cyberbullying

January 8, 2018 – Most Teenagers Aren’t Asking for Nude Photos

December 1, 2017 – Teens Talk: What Works to Stop Cyberbullying

October 3, 2017 – Digital Self-Harm: The Hidden Side of Adolescent Online Aggression

June 2, 2017 – More on the Link between Bullying and Suicide

February 24, 2017 – New Teen Sexting Data

January 3, 2017 – Millions of Students Skip School Each Year Because of Bullying

November 28, 2016 – Cultivating Resilience To Prevent Bullying and Cyberbullying

October 10, 2016 – New National Bullying and Cyberbullying Data

Related posts

data presentation about cyberbullying

58 Comments

Does the bullying that goes on in the political arena more recently on twitter count as "cyberbullying" or is it acceptable because they are politicians?

is there a correlation between the frequency of bullying and abuse/ and or neglect in the home of the adolescent who is the bully?

Hi Julie, This is something we have not yet studied. Hopefully a researcher in the field of social work can tackle this question, because like you, I think it's super important to look into.

What questionaire did you use and which questions were asked?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Cyberbullying_LOGO

IMAGES

  1. Cyberbullying Infographic Correction

    data presentation about cyberbullying

  2. 50 Crucial Cyberbullying Stats 2024 [Facts and Trends]

    data presentation about cyberbullying

  3. All the Latest Cyber Bullying Statistics and What They Mean In 2021

    data presentation about cyberbullying

  4. Cyberbullying Statistics in 2023 (with Charts): 37 "Key" Facts

    data presentation about cyberbullying

  5. Social Media Cyberbullying Infographic

    data presentation about cyberbullying

  6. Blog

    data presentation about cyberbullying

COMMENTS

  1. Cyberbullying Data 2019

    Presents data on cyberbullying from a 2019 national survey of middle and high school students in the United States. This study surveyed a nationally-representative sample of 4,972 middle and high school students between the ages of 12 and 17 in the United States. Data were collected in April of 2019. Click on the thumbnail images to enlarge.

  2. 2023 Cyberbullying Data

    2023 Cyberbullying Data. This study surveyed a nationally-representative sample of 5,005 middle and high school students between the ages of 13 and 17 in the United States. Data were collected in May and June of 2023. Click on the thumbnail images to enlarge. Cyberbullying Victimization.

  3. 2021 Cyberbullying Data

    Approximately 14% of the students in our 2021 sample admitted to cyberbullying others at some point in their lifetime. Posting mean comments online was the most commonly reported type of cyberbullying they reported during the previous 30 days (5.2%). About 6.5% of the sample reported cyberbullying using one or more of the eleven types reported ...

  4. Teens and Cyberbullying 2022

    Some 32% of teen girls have experienced two or more types of online harassment asked about in this survey, while 24% of teen boys say the same. And 15- to 17-year-olds are more likely than 13- to 14-year-olds to have been the target of multiple types of cyberbullying (32% vs. 22%). These differences are largely driven by older teen girls: 38% ...

  5. Cyber bullying

    Cyber bullying - Statistics & Facts. Cyberbullying is a form of harassment in digital communication mediums, such as text messages, internet forums, chat rooms, and social media. As opposed to ...

  6. A Majority of Teens Have Experienced Some Form of Cyberbullying

    For the latest survey data on teens and cyberbullying, see " Teens and Cyberbullying 2022." Name-calling and rumor-spreading have long been an unpleasant and challenging aspect of adolescent life. But the proliferation of smartphones and the rise of social media has transformed where, when and how bullying takes place.

  7. Cyberbullying Among Adolescents and Children: A Comprehensive Review of

    For the prevalence of cyberbullying victimization and perpetration, the data were reported in 18 and 14 studies, respectively. ... The association of cyber-bullying and adolescents in religious and secular schools in Israel. J Relig Health. (2019) 58:2095-109. 10.1007/s10943-019-00938-z [Google Scholar] 39. Beran T ...

  8. PDF Identifying Cyberbullying and Responding to Mental Health Consequences

    associated with aggressive acts, substance use, delinquency, and suicidal behavior. emotional impacts: anger, sadness, frustration, and embarrassment. Strongest associations with cyberbullying victimization include stress and suicidal ideation. associated with internalizing symptoms of depression & anxiety and both suicidal ideation & behavior ...

  9. Cyberbullying: What is it and how can you stop it?

    Cyberbullying can happen anywhere with an internet connection. While traditional, in-person bullying is still more common, data from the Cyberbullying Research Center suggest about 1 in every 4 teens has experienced cyberbullying, and about 1 in 6 has been a perpetrator. About 1 in 5 tweens, or kids ages 9 to 12, has been involved in cyberbullying (PDF, 5.57MB).

  10. Cyberbullying: Twenty Crucial Statistics for 2024

    Cyberbullying affects more than just kids. In a 2020 study, it was found that 44 percent of all internet users in the U.S. have experienced harassment online, which can be considered a type of cyberbullying. The most common type of online harassment was name-calling, making up 37 percent of all harassment.

  11. Cyberbullying and its influence on academic, social, and emotional

    Two instruments were used to collect data: The Revised Cyber Bullying Survey (RCBS), with a Cronbach's alpha ranging from .74 to .91 (Kowalski and Limber, 2007), designed to measure incidence, frequency and medium used to perpetrate cyberbullying. The survey is a 32-item questionnaire. The frequency was investigated using a 5-item scale with ...

  12. Summary of Our Cyberbullying Research (2004-2022)

    When it comes to more recent experiences, an average of about 13% of students have been cyberbullied across all of our studies within the 30 days prior to the survey. There does appear to be a trend over the last several years of this rate increasing steadily. For offending, across all of our studies, 6% of students admit to cyberbullying ...

  13. Prevalence and related risks of cyberbullying and its effects on

    Data presentation & statistical analysis. Simple tabulation frequencies were used to give a general overview of the data. The prevalence of cyberbullying was presented using 95% C.I.s, and the Chi-squared test was performed to determine the associations between individual categorical variables and Mental Health.

  14. Cyberbullying Statistics and Facts for 2024

    Older data on cyberbullying include the following: Most teenagers (over 80%) now use a mobile device regularly, opening them up to new avenues for bullying. (Source: Bullying Statistics) Half of all young adults have experienced cyberbullying in some form. A further 10-20% reported experiencing it regularly. (Source: Bullying Statistics)

  15. What Is Cyberbullying

    Cyberbullying includes sending, posting, or sharing negative, harmful, false, or mean content about someone else. It can include sharing personal or private information about someone else causing embarrassment or humiliation. Some cyberbullying crosses the line into unlawful or criminal behavior. The most common places where cyberbullying ...

  16. Curating Cyberbullying Datasets: a Human-AI Collaborative Approach

    The challenges with reaching a consensus on a common definition of cyberbullying, even among subject matter experts, impact the labeling of cyberbullying datasets and subsequently the algorithms and models derived from this data. Cyberbullying datasets are frequently labeled by human participants who may have little formal training or context ...

  17. PDF Interpreting Bullying/Cyberbullying School Climate Survey Data

    Guiding questions organized by data types (Bullying scale scores and item-level Bullying data): ... districts (Appendix A ) Initial and deeper guiding questions about bullying for schools (Appendix B) 1 For ease of presentation, the term bullying is used instead of bullying/cyberbullying in this guide. In all instances, both types of

  18. Cyberbullying Statistics 2021

    Provides updated statistics on cyberbullying from a national study on US youth by age, gender, sexual orientation, and race. In May-June 2021, we collected new data from a nationally representative sample of 2,546 US youth between the ages of 13 and 17 to better understand their positive and negative experiences online.

  19. Cyberbullying: Dealing with Online Bullies

    Unlike traditional bullying, cyberbullying isn't limited to schoolyards, street corners, or workplaces, but can occur anywhere via smartphones, tablets, and computers, 24 hours a day, seven days a week. Cyberbullies don't require face-to-face contact and their bullying isn't limited to just a handful of witnesses at a time.

  20. Cyberbullying Research Center

    Cyberbullying presents a dangerous threat in today's digital world to youth and adults alike. Access up-to-date resources and research on cyberbullying for parents, educators, students, non-profits, and tech companies. Read victim stories, learn about cyberbullying laws, and download relevant tips and strategies.

  21. Curating Cyberbullying Datasets: a Human-AI Collaborative ...

    Cyberbullying is the use of digital communication tools and spaces to inflict physical, mental, or emotional distress. This serious form of aggression is frequently targeted at, but not limited to, vulnerable populations. A common problem when creating machine learning models to identify cyberbullying is the availability of accurately annotated, reliable, relevant, and diverse datasets ...

  22. Free Google Slides and PPT Templates about bullying

    If you are giving an informative speech on cyberbullying and you need to present data in a visual way, take a look at our editable infographics.... Infographics. 16:9 / Like ... Download the Action Guide Against School Bullying presentation for PowerPoint or Google Slides. The education sector constantly demands dynamic and effective ways to ...

  23. 2016 Cyberbullying Data

    November 28, 2016 - Cultivating Resilience To Prevent Bullying and Cyberbullying. October 10, 2016 - New National Bullying and Cyberbullying Data. Presents findings and statistics on cyberbullying from a 2016 national survey of middle and high school students.