Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • CAREER FEATURE
  • 10 June 2024

What the science of elections can reveal in this super-election year

  • Benjamin Plackett 0

Benjamin Plackett is a freelance science journalist in Portsmouth, UK.

You can also search for this author in PubMed   Google Scholar

Nearly one billion Indians registered to vote ahead of the 2024 election, which saw Prime Minister Narendra Modi return for a historic third term. Credit: Elke Scholiers/Getty

By the end of this year, voters in some 65 countries and regions will have gone to the polls. That means close to half of the global population will have had the chance to cast a ballot of some sort, including almost 360 million people across the European Union. Not all of the world’s political procedures will be free and fair, but this year is still expected to represent the biggest manifestation of the democratic process in history. The geopolitical landscape of 2025 could therefore look very different from that of today, and it will have an impact not just on how science is funded, but also on which international collaborations will flourish or flounder.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-024-01712-2

Wang, W., Rothschild, D., Goel, S. & Gelman, A. Int. J. Forecast. 31 , 980–991 (2015).

Article   Google Scholar  

Download references

Related Articles

research paper about election

Massive Attack’s science-led drive to lower music’s carbon footprint

Career Feature 04 SEP 24

Tales of a migratory marine biologist

Tales of a migratory marine biologist

Career Feature 28 AUG 24

Nail your tech-industry interviews with these six techniques

Nail your tech-industry interviews with these six techniques

Career Column 28 AUG 24

Japan moves to halt long-term postgraduate decline by tripling number of PhD graduates

Japan moves to halt long-term postgraduate decline by tripling number of PhD graduates

Nature Index 29 AUG 24

No more hunting for replication studies: crowdsourced database makes them easy to find

No more hunting for replication studies: crowdsourced database makes them easy to find

Nature Index 27 AUG 24

Can South Korea regain its edge in innovation?

Can South Korea regain its edge in innovation?

Nature Index 21 AUG 24

South Korea can overcome its researcher shortage — but it must embrace all talents

South Korea can overcome its researcher shortage — but it must embrace all talents

Editorial 21 AUG 24

Iran’s presidential election is no breakthrough for reform or for science

Correspondence 20 AUG 24

Iran’s election is an opportunity for Western nations to revive science diplomacy

Postdoctoral Associate- Genetic Epidemiology

Houston, Texas (US)

Baylor College of Medicine (BCM)

research paper about election

NOMIS Foundation ETH Postdoctoral Fellowship

The NOMIS Foundation ETH Fellowship Programme supports postdoctoral researchers at ETH Zurich within the Centre for Origin and Prevalence of Life ...

Zurich, Canton of Zürich (CH)

Centre for Origin and Prevalence of Life at ETH Zurich

research paper about election

13 PhD Positions at Heidelberg University

GRK2727/1 – InCheck Innate Immune Checkpoints in Cancer and Tissue Damage

Heidelberg, Baden-Württemberg (DE) and Mannheim, Baden-Württemberg (DE)

Medical Faculties Mannheim & Heidelberg and DKFZ, Germany

research paper about election

Postdoctoral Associate- Environmental Epidemiology

Open faculty positions at the state key laboratory of brain cognition & brain-inspired intelligence.

The laboratory focuses on understanding the mechanisms of brain intelligence and developing the theory and techniques of brain-inspired intelligence.

Shanghai, China

CAS Center for Excellence in Brain Science and Intelligence Technology (CEBSIT)

research paper about election

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Mapping Election Administration + Election Science

White papers.

Mathematical tools, including a compass, a protractor, and a compass, lie on a wooden table on top of paper maps.

Elections Under Scrutiny

About the project, more resources, join the discussion.

This project comes at a pivotal time in the development of election administration and the study of election science in the United States. 

As public attention increasingly shifts to the nuts and bolts of running elections, the need for unbiased empirical research to inform decision-makers cannot be overstated. Yet the field is still relatively new and comparatively small, especially considering the disproportionate impact election administration has on the workings of our democracy. More work is needed to ensure a robust, useful evidence base is available to guide election officials, policymakers, and the non-profit community. 

The Mapping Election Administration and Election Science project is an assessment of the current state of knowledge and practice in seven key areas of election administration. Together with stakeholders from across the field, we hope to reach conclusions about areas where the existing evidence about best policies and practices is clear and also generate a research agenda for the future. 

To help with this consensus-building process, MEDSL commissioned seven white papers from experienced scholars in the field. Each of these white papers covers an area of administration that is critical to the convenience, security, and accuracy of elections in the United States. The papers map out:

the state of the field in the assigned area,

important empirical claims about where there is consensus about best practices, and 

critical needs for research that would close the gap between scientific knowledge and needs in the field.

This project is supported by the  Election Trust Initiative .

The white papers cover seven individual topics: 

Voting in person

Voting by mail

Voter registration accuracy and security

Poll worker and election official recruitment, training, and retention

Usability and accessibility

Audits and validating election results

Communicating with voters to build trust in the system

All seven are published and available on the MIT Election Lab website at the link below. Brief summaries of each paper are also available. 

Video Summaries

In addition to the full papers and summaries, we are pleased to offer short videos featuring the authors of each paper. These videos provide a short introduction to the topic, existing research, and future opportunities that each team identified. 

Reference LIsts

We are also happy to make available the full reference list for each topic. These lists represent a fairly comprehensive literature review of the existing research on each issue, and we look forward to keeping them updated as new resources come to fruition. 

To open up the conversation to a wider array of experts, we were delighted to host a two-day convening in September that brought together academics, election officials, policymakers, technologists, and advocates to discuss the current state of election administration and election science research. With drafts of these white papers as a common foundation for the discussion, participants from around the U.S. built consensus around topics for future work and explored opportunities for collaboration between the researchers and practitioners in attendance. 

As this project progresses, we hope to provide further opportunities for the public to provide feedback. Check back for more information if you are interested in participating!

Forecasting Elections: Voter Intentions versus Expectations

Subscribe to the economic studies bulletin, justin wolfers and justin wolfers nonresident senior fellow - economic studies david rothschild dr david rothschild.

November 1, 2012

Most pollsters base their election projections off questions of voter intentions, which ask “If the election were held today, who would you vote for?” By contrast, we probe the value of questions probing voters’ expectations, which typically ask: “Regardless of who you plan to vote for, who do you think will win the upcoming election?” We demonstrate that polls of voter expectations consistently yield more accurate forecasts than polls of voter intentions. A small-scale structural model reveals that this is because we are polling from a broader information set, and voters respond as if they had polled twenty of their friends. This model also provides a rational interpretation for why respondents’ forecasts are correlated with their expectations. We also show that we can use expectations polls to extract accurate election forecasts even from extremely skewed samples.

I. Introduction

Since the advent of scientific polling in the 1930s, political pollsters have asked people whom they intend to vote for; occasionally, they have also asked who they think will win. Our task in this paper is long overdue: we ask which of these questions yields more accurate forecasts. That is, we evaluate the predictive power of the questions probing voters’ intentions with questions probing their expectations . Judging by the attention paid by pollsters, the press, and campaigns, the conventional wisdom appears to be that polls of voters’ intentions are more accurate than polls of their expectations.

Yet there are good reasons to believe that asking about expectations yields more greater insight. Survey respondents may possess much more information about the upcoming political race than that probed by the voting intention question. At a minimum, they know their own current voting intention, so the information set feeding into their expectations will be at least as rich as that captured by the voting intention question. Beyond this, they may also have information about the current voting intentions—both the preferred candidate and probability of voting—of their friends and family. So too, they have some sense of the likelihood that today’s expressed intention will be changed before it ultimately becomes an election-day vote. Our research is motivated by idea that the richer information embedded in these expectations data may yield more accurate forecasts.

We find robust evidence that polls probing voters’ expectations yield more accurate predictions of election outcomes than the usual questions asking about who they intend to vote for. By comparing the performance of these two questions only when they are asked of the exact same people in exactly the same survey, we effectively difference out the influence of all other factors. Our primary dataset consists of all the state-level electoral presidential college races from 1952 to 2008, where both the intention and expectation question are asked. In the 77 cases in which the intention and expectation question predict different candidates, the expectation question picks the winner 60 times, while the intention question only picked the winner 17 times. That is, 78% of the time that these two approaches disagree, the expectation data was correct. We can also assess the relative accuracy of the two methods by assessing the extent to which each can be informative in forecasting the final vote share; we find that relying on voters’ expectations rather than their intentions yield substantial and statistically significant increases in forecasting accuracy. An optimally-weighted average puts over 90% weight on the expectations-based forecasts. Once one knows the results of a poll of voters expectations, there is very little additional information left in the usual polls of voting intentions. Our findings remain robust to correcting for an array of known biases in voter intentions data.

The better performance of forecasts based on asking voters about their expectations rather than their intentions, varies somewhat, depending on the specific context. The expectations question performs particularly well when: voters are embedded in heterogeneous (and thus, informative) social networks; when they don’t rely too much on common information; when small samples are involved (when the extra information elicited by asking about intentions counters the large sampling error in polls of intentions); and at a point in the electoral cycle when voters are sufficiently engaged as to know what their friends and family are thinking.

Our findings also speak to several existing strands of research within election forecasting. A literature has emerged documenting that prediction markets tend to yield more accurate forecasts than polls (Wolfers and Zitzewitz, 2004; Berg, Nelson and Rietz, 2008). More recently, Rothschild (2009) has updated these findings in light of the 2008 Presidential and Senate races, showing that forecasts based on prediction markets yielded systematically more accurate forecasts of the likelihood of Obama winning each state than did the forecasts based on aggregated intention polls compiled by Nate Silver for the website FiveThirtyEight.com. One hypothesis for this superior performance is that because prediction markets ask traders to bet on outcomes, they effectively ask a different question, eliciting the expectations rather than intentions of participants. If correct, this suggests that much of the accuracy of prediction markets could be obtained simply by polling voters on their expectations, rather than intentions.

These results also speak to the possibility of producing useful forecasts from non-representative samples (Robinson, 1937), an issue of renewed significance in the era of expensive-to-reach cellphones and cheap online survey panels. Surveys of voting intentions depend critically on being able to poll representative cross-sections of the electorate. By contrast, we find that surveys of voter expectations can still be quite accurate, even when drawn from non-representative samples. The logic of this claim comes from the difference between asking about expectations, which may not systematically differ across demographic groups, and asking about intentions, which clearly do. Again, the connection to prediction markets is useful, as Berg and Rietz (2006) show that prediction markets have yielded accurate forecasts, despite drawing from an unrepresentative pool of overwhelmingly white, male, highly educated, high income, self-selected traders.

While questions probing voters’ expectations have been virtually ignored by political forecasters, they have received some interest from psychologists. In particular, Granberg and Brent (1983) document wishful thinking, in which people’s expectation about the likely outcome is positively correlated with what they want to happen. Thus, people who intend to vote Republican are also more likely to predict a Republican victory. This same correlation is also consistent with voters preferring the candidate they think will win, as in bandwagon effects, or gaining utility from being optimistic. We re-interpret this correlation through a rational lens, in which the respondents know their own voting intention with certainty and have knowledge about the voting intentions of their friends and family.

Our alternative approach to political forecasting also provides a new narrative of the ebb and flow of campaigns, which should inform ongoing political science research about which events really matter. For instance, through the 2004 campaign, polls of voter intentions suggested a volatile electorate as George W. Bush and John Kerry swapped the lead several times. By contrast, polls of voters’ expectations consistently showed the Bush was expected to win re-election. Likewise in 2008, despite volatility in the polls of voters’ intentions, Obama was expected to win in all of the last 17 expectations polls taken over the final months of the campaign. And in the 2012 Republican primary, polls of voters intentions at different points showed Mitt Romney trailing Donald Trump, then Rick Perry, then Herman Cain, then Newt Gingrich and then Rick Santorum, while polls of expectations showed him consistently as the likely winner.

We believe that our findings provide tantalizing hints that similar methods could be useful in other forecasting domains. Market researchers ask variants of the voter intention question in an array of contexts, asking questions that elicit your preference for one product, over another. Likewise, indices of consumer confidence are partly based on the stated purchasing intentions of consumers, rather than their expectations about the purchase conditions for their community. The same insight that motivated our study—that people also have information on the plans of others—is also likely relevant in these other contexts. Thus, it seems plausible that survey research in many other domains may also benefit from paying greater attention to people’s expectations than to their intentions.

The rest of this paper proceeds as follows, In Section II, we describe our first cut of the data, illustrating the relative success of the two approaches to predicting the winner of elections. In Sections III and IV, we focus on evaluating their respective forecasts of the two-party vote share. Initially, in Section III we provide what we call naïve forecasts, which follow current practice by major pollsters; in Section IV we product statistically efficient forecasts, taking account of the insights of sophisticated modern political scientists. Section V provides out-of-sample forecasts based on the 2008 election. Section VI extends the assessment to a secondary data source which required substantial archival research to compile. In Section VII, we provide a small structural model which helps explain the higher degree of accuracy obtained from surveys of voter expectations. Section VIII characterizes the type of information that is reflected in voters’ expectation, arguing that it is largely idiosyncratic, rather than the sort of common information that might come from the mass media. Section IX assesses why it is that people’s expectations are correlated with their intentions. Section VI uses this model to show how we can obtain surprisingly accurate expectation-based forecasts with non-representative samples. We then conclude. To be clear about the structure of the argument: In the first part of the paper (through section IV) we simply present two alternative forecasting technologies and evaluate them, showing that expectations-based forecasts outperform those based on traditional intentions-based polls. We present these data without taking a strong position on why. But then in later sections we turn to trying to assess what explains this better performance. Because this assessment is model-based, our explanations are necessarily based on auxiliary assumptions (which we spell out).

Right now, we begin with our simplest and most transparent comparison of the forecasting ability of our two competing approaches.

Download the full paper » (PDF)

Related Content

Justin Wolfers

November 2, 2012

Erik Snowberg, Justin Wolfers, Eric Zitzewitz

June 13, 2012

Thomas E. Mann

September 15, 2016

Related Books

Bruce Katz Al Gore

March 1, 2000

Ron Haskins

August 24, 2007

Thomas E. Mann, Gary R. Orren

September 1, 1992

Campaigns & Elections

Economic Studies

Jonathan Rauch

September 6, 2024

September 5, 2024

William A. Galston

September 4, 2024

Democratic Backsliding in the World’s Largest Democracy

47 Pages Posted: 25 Jul 2023 Last revised: 6 Feb 2024

Sabyasachi Das

Ashoka University; Gokhale Institute of Politics and Economics

Date Written: January 31, 2024

Erosion of trust in the honesty of elections and concomitant weakening of democratic institutions and practices are growing concerns in modern global politics. This paper contributes to the discussion by detecting and examining a rare electoral irregularity observed in 2019 general election in India – the incumbent party won disproportionately more seats than it lost in closely contested constituencies. To examine whether this is due to electoral manipulation or effective campaigning by the ruling party, the paper tests for endogenous sorting of close election constituencies across the win margin threshold by applying the regression discontinuity design and other methods on several unique datasets. The evidence presented is consistent with electoral manipulation and is less supportive of the campaigning hypothesis. Manipulation appears to take the form of targeted deletion of voter names of and electoral discrimination against India’s largest minority group – Muslims, partly facilitated by weak monitoring by election observers. The results present a worrying development for the future of the World’s largest democracy.

Keywords: Electoral fraud, precise control, democracy, economics of religion

JEL Classification: D72, D73, P00, Z12

Suggested Citation: Suggested Citation

Sabyasachi Das (Contact Author)

Ashoka university ( email ).

Rajiv Gandhi Education City, Plot #2, Sonepat Rai, Haryana 131029 India

HOME PAGE: http://dassabyasachi.wordpress.com/

Gokhale Institute of Politics and Economics ( email )

BMCC Road Pune, Maharashtra 411004 India

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, social & personality psychology ejournal.

Subscribe to this fee journal for more curated articles on this topic

Political Economy - Development: Political Institutions eJournal

Political economy - development: comparative regional economies ejournal.

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Key things to know about U.S. election polling in 2024

Conceptual image of an oversized voting ballot box in a large crowd of people with shallow depth of field

Confidence in U.S. public opinion polling was shaken by errors in 2016 and 2020. In both years’ general elections, many polls underestimated the strength of Republican candidates, including Donald Trump. These errors laid bare some real limitations of polling.

In the midterms that followed those elections, polling performed better . But many Americans remain skeptical that it can paint an accurate portrait of the public’s political preferences.

Restoring people’s confidence in polling is an important goal, because robust and independent public polling has a critical role to play in a democratic society. It gathers and publishes information about the well-being of the public and about citizens’ views on major issues. And it provides an important counterweight to people in power, or those seeking power, when they make claims about “what the people want.”

The challenges facing polling are undeniable. In addition to the longstanding issues of rising nonresponse and cost, summer 2024 brought extraordinary events that transformed the presidential race . The good news is that people with deep knowledge of polling are working hard to fix the problems exposed in 2016 and 2020, experimenting with more data sources and interview approaches than ever before. Still, polls are more useful to the public if people have realistic expectations about what surveys can do well – and what they cannot.

With that in mind, here are some key points to know about polling heading into this year’s presidential election.

Probability sampling (or “random sampling”). This refers to a polling method in which survey participants are recruited using random sampling from a database or list that includes nearly everyone in the population. The pollster selects the sample. The survey is not open for anyone who wants to sign up.

Online opt-in polling (or “nonprobability sampling”). These polls are recruited using a variety of methods that are sometimes referred to as “convenience sampling.” Respondents come from a variety of online sources such as ads on social media or search engines, websites offering rewards in exchange for survey participation, or self-enrollment. Unlike surveys with probability samples, people can volunteer to participate in opt-in surveys.

Nonresponse and nonresponse bias. Nonresponse is when someone sampled for a survey does not participate. Nonresponse bias occurs when the pattern of nonresponse leads to error in a poll estimate. For example, college graduates are more likely than those without a degree to participate in surveys, leading to the potential that the share of college graduates in the resulting sample will be too high.

Mode of interview. This refers to the format in which respondents are presented with and respond to survey questions. The most common modes are online, live telephone, text message and paper. Some polls use more than one mode.

Weighting. This is a statistical procedure pollsters perform to make their survey align with the broader population on key characteristics like age, race, etc. For example, if a survey has too many college graduates compared with their share in the population, people without a college degree are “weighted up” to match the proper share.

How are election polls being conducted?

Pollsters are making changes in response to the problems in previous elections. As a result, polling is different today than in 2016. Most U.S. polling organizations that conducted and publicly released national surveys in both 2016 and 2022 (61%) used methods in 2022 that differed from what they used in 2016 . And change has continued since 2022.

A sand chart showing that, as the number of public pollsters in the U.S. has grown, survey methods have become more diverse.

One change is that the number of active polling organizations has grown significantly, indicating that there are fewer barriers to entry into the polling field. The number of organizations that conduct national election polls more than doubled between 2000 and 2022.

This growth has been driven largely by pollsters using inexpensive opt-in sampling methods. But previous Pew Research Center analyses have demonstrated how surveys that use nonprobability sampling may have errors twice as large , on average, as those that use probability sampling.

The second change is that many of the more prominent polling organizations that use probability sampling – including Pew Research Center – have shifted from conducting polls primarily by telephone to using online methods, or some combination of online, mail and telephone. The result is that polling methodologies are far more diverse now than in the past.

(For more about how public opinion polling works, including a chapter on election polls, read our short online course on public opinion polling basics .)

All good polling relies on statistical adjustment called “weighting,” which makes sure that the survey sample aligns with the broader population on key characteristics. Historically, public opinion researchers have adjusted their data using a core set of demographic variables to correct imbalances between the survey sample and the population.

But there is a growing realization among survey researchers that weighting a poll on just a few variables like age, race and gender is insufficient for getting accurate results. Some groups of people – such as older adults and college graduates – are more likely to take surveys, which can lead to errors that are too sizable for a simple three- or four-variable adjustment to work well. Adjusting on more variables produces more accurate results, according to Center studies in 2016 and 2018 .

A number of pollsters have taken this lesson to heart. For example, recent high-quality polls by Gallup and The New York Times/Siena College adjusted on eight and 12 variables, respectively. Our own polls typically adjust on 12 variables . In a perfect world, it wouldn’t be necessary to have that much intervention by the pollster. But the real world of survey research is not perfect.

research paper about election

Predicting who will vote is critical – and difficult. Preelection polls face one crucial challenge that routine opinion polls do not: determining who of the people surveyed will actually cast a ballot.

Roughly a third of eligible Americans do not vote in presidential elections , despite the enormous attention paid to these contests. Determining who will abstain is difficult because people can’t perfectly predict their future behavior – and because many people feel social pressure to say they’ll vote even if it’s unlikely.

No one knows the profile of voters ahead of Election Day. We can’t know for sure whether young people will turn out in greater numbers than usual, or whether key racial or ethnic groups will do so. This means pollsters are left to make educated guesses about turnout, often using a mix of historical data and current measures of voting enthusiasm. This is very different from routine opinion polls, which mostly do not ask about people’s future intentions.

When major news breaks, a poll’s timing can matter. Public opinion on most issues is remarkably stable, so you don’t necessarily need a recent poll about an issue to get a sense of what people think about it. But dramatic events can and do change public opinion , especially when people are first learning about a new topic. For example, polls this summer saw notable changes in voter attitudes following Joe Biden’s withdrawal from the presidential race. Polls taken immediately after a major event may pick up a shift in public opinion, but those shifts are sometimes short-lived. Polls fielded weeks or months later are what allow us to see whether an event has had a long-term impact on the public’s psyche.

How accurate are polls?

The answer to this question depends on what you want polls to do. Polls are used for all kinds of purposes in addition to showing who’s ahead and who’s behind in a campaign. Fair or not, however, the accuracy of election polling is usually judged by how closely the polls matched the outcome of the election.

A diverging bar chart showing polling errors in U.S. presidential elections.

By this standard, polling in 2016 and 2020 performed poorly. In both years, state polling was characterized by serious errors. National polling did reasonably well in 2016 but faltered in 2020.

In 2020, a post-election review of polling by the American Association for Public Opinion Research (AAPOR) found that “the 2020 polls featured polling error of an unusual magnitude: It was the highest in 40 years for the national popular vote and the highest in at least 20 years for state-level estimates of the vote in presidential, senatorial, and gubernatorial contests.”

How big were the errors? Polls conducted in the last two weeks before the election suggested that Biden’s margin over Trump was nearly twice as large as it ended up being in the final national vote tally.

Errors of this size make it difficult to be confident about who is leading if the election is closely contested, as many U.S. elections are .

Pollsters are rightly working to improve the accuracy of their polls. But even an error of 4 or 5 percentage points isn’t too concerning if the purpose of the poll is to describe whether the public has favorable or unfavorable opinions about candidates , or to show which issues matter to which voters. And on questions that gauge where people stand on issues, we usually want to know broadly where the public stands. We don’t necessarily need to know the precise share of Americans who say, for example, that climate change is mostly caused by human activity. Even judged by its performance in recent elections, polling can still provide a faithful picture of public sentiment on the important issues of the day.

The 2022 midterms saw generally accurate polling, despite a wave of partisan polls predicting a broad Republican victory. In fact, FiveThirtyEight found that “polls were more accurate in 2022 than in any cycle since at least 1998, with almost no bias toward either party.” Moreover, a handful of contrarian polls that predicted a 2022 “red wave” largely washed out when the votes were tallied. In sum, if we focus on polling in the most recent national election, there’s plenty of reason to be encouraged.

Compared with other elections in the past 20 years, polls have been less accurate when Donald Trump is on the ballot. Preelection surveys suffered from large errors – especially at the state level – in 2016 and 2020, when Trump was standing for election. But they performed reasonably well in the 2018 and 2022 midterms, when he was not.

Pew Research Center illustration

During the 2016 campaign, observers speculated about the possibility that Trump supporters might be less willing to express their support to a pollster – a phenomenon sometimes described as the “shy Trump effect.” But a committee of polling experts evaluated five different tests of the “shy Trump” theory and turned up little to no evidence for each one . Later, Pew Research Center and, in a separate test, a researcher from Yale also found little to no evidence in support of the claim.

Instead, two other explanations are more likely. One is about the difficulty of estimating who will turn out to vote. Research has found that Trump is popular among people who tend to sit out midterms but turn out for him in presidential election years. Since pollsters often use past turnout to predict who will vote, it can be difficult to anticipate when irregular voters will actually show up.

The other explanation is that Republicans in the Trump era have become a little less likely than Democrats to participate in polls . Pollsters call this “partisan nonresponse bias.” Surprisingly, polls historically have not shown any particular pattern of favoring one side or the other. The errors that favored Democratic candidates in the past eight years may be a result of the growth of political polarization, along with declining trust among conservatives in news organizations and other institutions that conduct polls.

Whatever the cause, the fact that Trump is again the nominee of the Republican Party means that pollsters must be especially careful to make sure all segments of the population are properly represented in surveys.

The real margin of error is often about double the one reported. A typical election poll sample of about 1,000 people has a margin of sampling error that’s about plus or minus 3 percentage points. That number expresses the uncertainty that results from taking a sample of the population rather than interviewing everyone . Random samples are likely to differ a little from the population just by chance, in the same way that the quality of your hand in a card game varies from one deal to the next.

A table showing that sampling error is not the only kind of polling error.

The problem is that sampling error is not the only kind of error that affects a poll. Those other kinds of error, in fact, can be as large or larger than sampling error. Consequently, the reported margin of error can lead people to think that polls are more accurate than they really are.

There are three other, equally important sources of error in polling: noncoverage error , where not all the target population has a chance of being sampled; nonresponse error, where certain groups of people may be less likely to participate; and measurement error, where people may not properly understand the questions or misreport their opinions. Not only does the margin of error fail to account for those other sources of potential error, putting a number only on sampling error implies to the public that other kinds of error do not exist.

Several recent studies show that the average total error in a poll estimate may be closer to twice as large as that implied by a typical margin of sampling error. This hidden error underscores the fact that polls may not be precise enough to call the winner in a close election.

Other important things to remember

Transparency in how a poll was conducted is associated with better accuracy . The polling industry has several platforms and initiatives aimed at promoting transparency in survey methodology. These include AAPOR’s transparency initiative and the Roper Center archive . Polling organizations that participate in these organizations have less error, on average, than those that don’t participate, an analysis by FiveThirtyEight found .

Participation in these transparency efforts does not guarantee that a poll is rigorous, but it is undoubtedly a positive signal. Transparency in polling means disclosing essential information, including the poll’s sponsor, the data collection firm, where and how participants were selected, modes of interview, field dates, sample size, question wording, and weighting procedures.

There is evidence that when the public is told that a candidate is extremely likely to win, some people may be less likely to vote . Following the 2016 election, many people wondered whether the pervasive forecasts that seemed to all but guarantee a Hillary Clinton victory – two modelers put her chances at 99% – led some would-be voters to conclude that the race was effectively over and that their vote would not make a difference. There is scientific research to back up that claim: A team of researchers found experimental evidence that when people have high confidence that one candidate will win, they are less likely to vote. This helps explain why some polling analysts say elections should be covered using traditional polling estimates and margins of error rather than speculative win probabilities (also known as “probabilistic forecasts”).

National polls tell us what the entire public thinks about the presidential candidates, but the outcome of the election is determined state by state in the Electoral College . The 2000 and 2016 presidential elections demonstrated a difficult truth: The candidate with the largest share of support among all voters in the United States sometimes loses the election. In those two elections, the national popular vote winners (Al Gore and Hillary Clinton) lost the election in the Electoral College (to George W. Bush and Donald Trump). In recent years, analysts have shown that Republican candidates do somewhat better in the Electoral College than in the popular vote because every state gets three electoral votes regardless of population – and many less-populated states are rural and more Republican.

For some, this raises the question: What is the use of national polls if they don’t tell us who is likely to win the presidency? In fact, national polls try to gauge the opinions of all Americans, regardless of whether they live in a battleground state like Pennsylvania, a reliably red state like Idaho or a reliably blue state like Rhode Island. In short, national polls tell us what the entire citizenry is thinking. Polls that focus only on the competitive states run the risk of giving too little attention to the needs and views of the vast majority of Americans who live in uncompetitive states – about 80%.

Fortunately, this is not how most pollsters view the world . As the noted political scientist Sidney Verba explained, “Surveys produce just what democracy is supposed to produce – equal representation of all citizens.”

  • Survey Methods
  • Trust, Facts & Democracy
  • Voter Files

Download Scott Keeter's photo

Scott Keeter is a senior survey advisor at Pew Research Center .

Download Courtney Kennedy's photo

Courtney Kennedy is Vice President of Methods and Innovation at Pew Research Center .

How do people in the U.S. take Pew Research Center surveys, anyway?

How public polling has changed in the 21st century, what 2020’s election poll errors tell us about the accuracy of issue polling, a field guide to polling: election 2020 edition, methods 101: how is polling done around the world, most popular.

901 E St. NW, Suite 300 Washington, DC 20004 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

© 2024 Pew Research Center

The Study of Election Campaigning

Cite this chapter.

research paper about election

  • Shaun Bowler &
  • David M. Farrell  

Part of the book series: Contemporary Political Studies ((CONTPOLSTUD))

196 Accesses

6 Citations

Election campaigns attract great attention from voters, media and academics alike. The academics, however, tend to focus their research on the electoral result and on societal and long-term political factors influencing that result. The election campaign — the event of great interest, which has at least some role to play in affecting the result — is usually passed over or at most receives minimal attention. It is generally left to the journalists and pundits to give their insights into the campaign; scanning every television programme and newspaper for the latest news or gossip, scrutinising every campaign development — whether an initiative or gaffe — for its potential effect on the result. These are ‘the boys on the bus,’ the campaign journalists who, emulating Theodore White (1961), provide fascinating accounts of the nitty-gritty of election campaigning. 1 But such studies emphasise the short-term and the ephemeral, rather than the underlying process to any campaign. They necessarily stress the unique rather than the general and as such promote the view of campaigns and campaigning as behaviour specific to each election, indeed to each party.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Similar content being viewed by others

research paper about election

Election campaigning enters a fourth phase: the mediatized campaign

research paper about election

The 1970 General Election

research paper about election

The Media Campaign: The Issues and Personalities Who Defined the Election

Abrams, M. (1964), ‘Opinion Polls and Party Propaganda’, Public Opinion Quarterly , 28, pp. 13–29.

Article   Google Scholar  

Agranoff, R. (ed.) (1976a), The New Style in Election Campaigns , 2nd edn (Boston: Halbrook Press).

Google Scholar  

Agranoff, R. (ed.) (1976b), The Management of Election Campaigns (New York: Halbrook Press).

Alexander, H. (ed.), (1989a), Comparative Political Finance in the 1980s (Cambridge: Cambridge University Press).

Book   Google Scholar  

Alexander, H. (ed.), (1989b), ‘Money and Politics: Rethinking a Conceptual Framework’, in H. Alexander (ed.), Comparative Political Finance in the 1980s (Cambridge: Cambridge University Press).

Chapter   Google Scholar  

Arndt, J. (1978), ‘How Broad Should the Marketing-Concept Be?’, Journal of Marketing , 43, pp. 101–3.

Atkinson, M. (1984), Our Masters’ Voices: The Language and Body Language of Politics (London: Methuen).

Bartels, R. (1974) ‘The Identity Crisis in Marketing’, Journal of Marketing , 38, pp. 73–6.

Bernays, E. (ed.), (1955), The Engineering of Consent (Norman: University of Oklahoma Press).

Bochel, J.M. and Denver, D. (1971), ‘Canvassing, Turnout and Party Support: An Experiment’, British Journal of Political Science , 1, pp. 257–69.

Boim, D. (1984), ‘The Telemarketing Center: Nucleus of a Modem Campaign’, Campaigns and Elections , 5, pp. 73–8.

Bowler, S. (1990a), ‘Consistency and Inconsistency in Canadian Party Identifications: Towards an Institutional Approach’, Electoral Studies , 9, pp. 133–47.

Bowler, S. (1990b), ‘Voter Perceptions and Party Strategies: An Empirical Approach’, Comparative Politics , 23, pp. 61–83.

Budge, I. and Farlie, D. (1983), Explaining and Predicting Elections: Issue Effects and Party Strategies in Twenty-Three Democracies (London: George Allen & Unwin).

Butler, D. and Kavanagh, D. (1988), The British General Election of 1987 (London: Macmillan).

Carman, J. (1973), ‘On the Universality of Marketing’, Journal of Contemporary Business , 2, p. 14.

Chagall, D. (1981), The New King-Makers (New York and London: Harcourt Brace Jovanovich).

Chartrand, R. (1972), Computers and Political Campaigning (New York: Spartan Books).

Clark, E. (1981), ‘The Lists Business Boom’, Marketing , (December) pp. 25–8.

Cockerell, M., Hennessy, P. and Walker, D. (1984), Sources Close to the Prime Minister: Inside the Hidden World of the News Manipulators (London: Macmillan).

Crewe, I. and Harrop, M. (eds) (1986), Political Communications: The General Election Campaign of 1983 (Cambridge: Cambridge University Press).

Crewe, I. and Harrop, M. (eds) (1989), Political Communications: The General Election Campaign of 1987 (Cambridge: Cambridge University Press).

Crotty, W. L. (1971), ‘Party Effort and its Impact on the Vote’, American Political Science Review , 65, pp. 439–50.

Crouse, T. (1972), The Boys on the Bus (New York: Random House).

Curtis, G. (1988), The Japanese Way of Politics (New York: Columbia University Press).

Cuthright, P. (1963), ‘Measuring the Impact of Local Party Activity on the General Election Vote’, Public Opinion Quarterly , 27, pp. 372–86.

Dalton, R., Flanagan, S. and Beck, P. (eds) (1984), Electoral Change in Advanced Industrial Democracies: Realignment or Dealignment? (Princeton: University Press).

Diamond, E. and Bates, S. (1984), The Spot: The Rise of Political Advertising on Television (Cambridge, Mass.: MIT Press).

Diamond, E. and Bates, S. (1985), ‘The Ads’, Public Opinion , 7 55–7, 64.

Downs, A. (1957), An Economic Theory of Democracy (New York: Harper & Row).

Eldersveld, S.J. (1956), ‘Experimental Propaganda Techniques and Voting Behavior’, American Political Science Review , 50, pp. 154–65.

Elklit, J. (1991), ‘Sub-National Election Campaigns: The Danish Local Elections of November 1989’, Scandinavian Political Studies , 14, pp. 219–39.

Eriksson, E.M. (1937), ‘President Jackson’s Propaganda Agencies’, Pacific Historical Review 6, pp. 47–57.

Farrell, D. (1986), ‘The Strategy to Market Fine Gael in 1981’, Irish Political Studies , 1, pp. 1–14.

Farrell, D. (1989), ‘Changes in the European Electoral Process: A Trend Towards ‘Americanization’?’, Manchester Papers in Politics , no.6/89.

Farrell, D. and Wortmann, M. (1987), ‘Party Strategies in the Electoral Market: Political Marketing in West Germany, Britain, and Ireland’, European Journal of Political Research , 15, pp. 297–318.

Gosnell, H. (1927), Getting out the Vote: An Experiment in the Stimulation of Voting (Chicago: University of Chicago Press).

Graham, R. (1984), Spain: Change of a Nation (London: Michael Joseph).

Haggerty, B. (1979), ‘Direct Mail Political Fund Raising’, Public Relations Journal , 35, pp. 10–13.

Harris, P.C. (1982), ‘Politics by Mail: A New Platform’, The Wharton Magazine (Fall), pp. 16–19.

Harrop, M. and Miller, W. L. (1987), Elections and Voters: A Comparative Introduction (Basingstoke: Macmillan).

Hiebert, R., Jones, R., d’Arc Lorenz, J. and Lotito, E. (eds) (1975), The Political Image Merchants: Strategies for the Seventies (Washington: Acropolis Books).

Hofstetter, C. R. and Zukin, C. (1979), ‘TV Network Political News and Advertising in the Nixon and McGovern Campaigns’, Journalism Quarterly , 56, pp. 106–15, 152.

Irvine, W. (1987), ‘Canada, 1945–1980: Party Platforms and Campaign Strategies’, in I. Budge et al., Ideology, Strategy and Party Change (Cambridge: Cambridge University Press).

Jamieson, K. H. (1984), Packaging the Presidency: A History and Criticism of Presidential Campaign Advertising (Oxford: University Press).

Katz, D. and Eldersveld, S. (1961), ‘The Impact of Local Party Activity upon the Electorate’, Public Opinion Quarterly , 25, pp. 1–24.

Katz, R. (1980), A Theory of Parties and Electoral Systems (Baltimore: Johns Hopkins University Press).

Kelley, S. (1956), Professional Public Relations and Political Power (Baltimore: Johns Hopkins University Press).

Kirchheimer, O. (1966), ‘The Transformation of Western European Party Systems’, in J. LaPalombara and M. Weiner (eds), Political Parties and Political Development (Princeton: Princeton University Press).

Kotler, P. (1972), ‘A Generic Concept of Marketing’, Journal of Marketing 36, pp. 46–54.

Kotler, P. (1975), ‘Political Candidate Marketing’, in P. Kotler (ed.), Marketing for Non-Profit Organizations (Englewood Cliffs, NJ: Prentice-Hall).

Kotler, P. (1980), Marketing Management: Analysis, Planning and Control , 4th edn (Englewood Cliffs, NJ: Prentice-Hall).

Kotler, P. and Levy, S. J. (1969a), ‘Broadening the Concept of Marketing’, Journal of Marketing , 33, pp. 10–15.

Kotler, P. and Levy, S. J. (1969b), ‘A New Form of Marketing Myopia: Rejoinder to Prof. Luck’, Journal of Marketing , 33, pp. 55–7.

Kramer, G. (1970), ‘The Effects of Precinct-Level Canvassing on Voter Behavior’, Public Opinion Quarterly , 34, pp. 560–72.

Kurjian, D. (1984), ‘Expressions Win Elections’, Campaigns and Elections , 5, pp. 6–11.

Lindon, D. (1976), Marketing Politique et Social (Paris: Dalloz).

Luck, D. J. (1969), ‘Broadening the Concept of Marketing — Too Far’, Journal of Marketing , 33, pp. 53–5.

Luck, D. J. (1974), ‘Social Marketing: Confusion Compounded’, Journal of Marketing , 38, p. 70.

Luntz, F. (1988), Candidates, Consultants and Campaigns (Oxford: Basil Blackwell).

Lupfer, M. and Price, D. (1972), ‘On the Merits of Face-to-Face Campaigning’, Social Science Quarterly , 55, pp. 534–43.

Mannelli, G. and Cheli, E. (1986), L’immagine del potere: Comportmaneti, atteggiamenti e strategie d’immagine dei leader politici italiani (Milano: Franco Angeli Libri.)

Martel, M. (1983), Political Campaign Debates: Images, Strategies and Tactics (New York: Longman).

Mauser, G. (1983), Political Marketing: An Approach to Campaign Strategy (New York: Praeger).

Mintz, E. (1985), ‘Election Campaign Tours in Canada’, Political Geography Quarterly , 4, pp. 47–54.

Napolitan, J. (1972), The Election Game (New York: Doubleday).

Nimmo, D. (1970), The Political Persuaders: The Techniques of Modern Election Campaigns (Englewood Cliffs, NJ: Prentice-Hall).

O’Shaughnessy, N.J. (1990), The Phenomenon of Political Marketing (Basingstoke: Macmillan).

O’Shaughnessy, N.J. and G. Peele (1985), ‘Money, Mail and Markets: Reflections on Direct Mail in American Politics’, Electoral Studies , 4, pp. 115–24.

Pedersen, M. (1983), ‘Changing Patterns of Electoral Volatility in European Party Systems, 1948–1977: Explorations in Explanation’, in H. Daalder and P. Mair (eds), West European Party Systems (Beverly Hills: Sage).

Peele, G. (1982), ‘Campaign Consultants’, Electoral Studies , 1, pp. 355–62.

Pitchell, R. J. (1958), ‘Influence of Professional Campaign Management Firms in Partisan Elections in California’, Western Political Quarterly , 11, pp. 278–300.

Poguntke, T. (1989), ‘The ‘New Politics Dimension’ in European Green Parties’, in F. Müller-Rommel (ed.), New Politics in Western Europe: The Rise and Success of Green Parties and Alternative Lists (Boulder, Co.: Westview Press).

Price, D. and Lupfer, M. (1973), ‘Volunteers for Gore: The Impact of a Precinct-Level Canvass in Three Tennessee Cities’, Journal of Politics , 35, pp. 410–38.

Robertson, D. (1976), A Theory of Party Competition (London: Wiley).

Robinson, R. (1989), ‘Coalitions and Political Parties in Sub-National Government: The Case of Spain’, in C. Mellors and B. Pijnenburg (eds), Political Parties and Coalitions in European Local Government (London: Routledge).

Roll, C. (1982), ‘Private Opinion Polls’, in G. Benjamin (ed.), The Communication Revolution in Politics (New York: Academy of Political Science).

Rose, R. (1967), Influencing Voters: A Study of Campaign Rationality (New York: St Martin’s Press).

Ross, I. (1959), The Super-Salesmen of California Politics: Whitaker and Baxter’, Harper’s Magazine , (July), pp. 55–61.

Rowland, R. and Payne, R. (1984), ‘The Context-Embeddedness of Political Discourse: A Re-evaluation of Reagan’s Rhetoric in the 1982 Midterm Election Campaign’, Presidential Studies Quarterly 14, pp. 500–11.

Sabato, L. (1981), The Rise of Political Consultants: New Ways of Winning Elections (New York: Basic Books).

Sabato, L. (1985), PAC Power: Inside the World of Political Action Committees (New York: W. W. Norton).

Shadegg, S. (1964), How to Win an Election: The Art of Political Victory (New York: Taplinger).

Shadegg, S. (1972), The New How to Win an Election (New York: Taplinger).

Shama, A. (1975), ‘Political Marketing: A Study of Voter Decision-Making Process and Candidate Marketing Strategy’, in R. Curran (ed.), 1974 Combined Proceedings Series No. 34 (Michigan: American Marketing Association).

Shyles, L. (1984a), ‘Defining “Images” of Presidential Candidates from Televised Political Spot Advertisements’, Political Behavior 6, pp. 171–81.

Shyles, L. (1984b), ‘The Relationship of Images, Issues and Presidential Methods in Televised Spot Advertisements for 1980s American Presidential Primaries’, Journal of Broadcasting , 28, pp. 405–21.

Smith, A. (1981), ‘Mass Communications’, in D. Butler et al., Democracy at the Polls: A Comparative Study of National Elections (Washington, DC: American Enterprise Institute).

Snyder, J. D. (1982), ‘Playing Politics by Mail’, Sales and Marketing Management , (July), pp. 44–6.

Statera, G. (1986), La Politica Spettacolo: Politici e Mass Media Nell’era Dell’immagine (Milan: Mondadori).

Steinberg, A. (1976a), Political Campaign Management: A Systems Approach (Lexington, Mass.: D. C. Heath).

Steinberg, A. (1976b), The Political Campaign Handbook: Media, Scheduling and Advance (Lexington, Mass.: D. C. Heath).

Tobe, F. (1984), ‘New Techniques in Computerized Voter Contact’, Campaigns and Elections , 5, pp. 56–64.

Tyler, R. (1987), Campaign! The Selling of the Prime Minister (London: Grafton Books).

Wangen, E. (1983), Polit-Marketing: Das Marketing-Management der Politischen Parteien (Opladen: Westdeutsher Verlag).

Weir, B. (1985), ‘The American Tradition of the Experimental Treatment of Elections: A Review Essay’, Electoral Studies , 4, pp. 125–33.

West, D. (1984), ‘Cheers and Jeers: Candidate Presentations and Audience Reactions in the 1980 Presidential Election’, American Politics Quarterly , 12, pp. 23–50.

White, T. (1961), The Making of the President, 1960 (New York: Atheneum).

Witherspoon, J. (1984), ‘Campaign Commercials and the Media Blitz’, Campaigns and Elections , 5, pp. 6–20.

Woo, L.C. (1980), The Campaign Organizer’s Manual (Durham, North Carolina: Carolina Academic Press).

Worcester, R. and Harrop, M. (eds) (1982), Political Communications: The General Election Campaign of 1979 (London: George Allen & Unwin).

Wright, W. (1971), ‘Comparative Party Models: Rational-Efficient and Party Democracy’, in W. Wright (ed.), A Comparative Study of Party Organization (Ohio: Charles E. Merrill).

Download references

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Department of Political Science, University of California, USA

Shaun Bowler ( Assistant Professor ) ( Assistant Professor )

Department of Government, University of Manchester, UK

David M. Farrell ( Jean Monnet Lecturer in European Politics ) ( Jean Monnet Lecturer in European Politics )

Copyright information

© 1992 The Macmillan Press Ltd

About this chapter

Bowler, S., Farrell, D.M. (1992). The Study of Election Campaigning. In: Bowler, S., Farrell, D.M. (eds) Electoral Strategies and Political Marketing. Contemporary Political Studies. Palgrave Macmillan, London. https://doi.org/10.1007/978-1-349-22411-1_1

Download citation

DOI : https://doi.org/10.1007/978-1-349-22411-1_1

Publisher Name : Palgrave Macmillan, London

Print ISBN : 978-1-349-22413-5

Online ISBN : 978-1-349-22411-1

eBook Packages : Palgrave Political & Intern. Studies Collection Political Science and International Studies (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PeerJ Comput Sci
  • PMC10495957

Logo of peerjcs

On the frontiers of Twitter data and sentiment analysis in election prediction: a review

Quratulain alvi.

1 Department of Software Engineering, University of Management and Technology, Lahore, Punjab, Pakistan

Syed Farooq Ali

Sheikh bilal ahmed, nadeem ahmad khan.

2 Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences, Lahore, Punjab, Pakistan

Mazhar Javed

Haitham nobanee.

3 Faculty of Humanities and Social Sciences, University of Liverpool, Liverpool, United Kingdom

4 College of Business, Abu Dhabi University, Abu Dhabi, United Arab Emirates

5 Oxford Centre for Islamic Studies, University of Oxford, Oxford, United Kingdom

Election prediction using sentiment analysis is a rapidly growing field that utilizes natural language processing and machine learning techniques to predict the outcome of political elections by analyzing the sentiment of online conversations and news articles. Sentiment analysis, or opinion mining, involves using text analysis to identify and extract subjective information from text data sources. In the context of election prediction, sentiment analysis can be used to gauge public opinion and predict the likely winner of an election. Significant progress has been made in election prediction in the last two decades. Yet, it becomes easier to have its comprehensive view if it has been appropriately classified approach-wise, citation-wise, and technology-wise. The main objective of this article is to examine and consolidate the progress made in research about election prediction using Twitter data. The aim is to provide a comprehensive overview of the current state-of-the-art practices in this field while identifying potential avenues for further research and exploration.

Introduction

The emergence of the information revolution has led to new economies centered around the flow of data, information, and knowledge ( Serrat & Serrat, 2017 ). The Internet has brought about a significant transformation in content consumption. The vast amounts of data generated, coupled with the rapid dissemination of information and its easy accessibility, have turned social platforms into prime examples of interactions among millions of individuals ( Gadek et al., 2017 ). These individuals actively engage with the shared content, effectively transforming their networks into successful platforms for exchanging information ( Hu & Wang, 2020 ). Social media (SM) started its journey in the late 90s but got the world’s attention by providing a means of communication with people who are far away or to make friends. This ease became an addiction as it grew with more and more social networking sites.

Social networking sites allow people to express their thoughts, ideas, opinions, and feelings on various worldly and social matters through reactions, commenting, or sharing posts ( Ceron, Curini & Iacus, 2015b ). The exponential development of internet-based life and informal organization locales like Facebook and Twitter has begun to assume a developing part on certifiable legislative issues in recent years ( Cottle, 2011 ).

Facebook and Twitter played a facilitating role for individuals, industries, and political nations worldwide ( Segerberg & Bennett, 2011 ; Liao et al., 2018 ). Political parties such as Swedish Pirate Party, German Pirate Party, and Italy’s Five Star Movement Party used social networking sites to send the agenda to the whole country ( Metzgar & Maruggi, 2009 ).

The USA election campaigns in 2008, 2012, and 2016 demonstrated the ground-breaking effect of SM on the general population of the United States. Obama was the first politician to effectively utilize SM as a campaign strategy ( Smith, 2009 ). By the end, they knew the names of every one of the 69,406,897 citizens who were ready to vote in favor of Obama. To persuade the citizens, Obama hired an IT specialist in data mining and machine learning who sent customized messages for a cost-effective outreach to voters ( Vitak et al., 2011 ). The overall digital enthusiasm for Trump was three times higher than Clinton, as indicated by Google Pattern Analysis, which made him victorious in the elections ( MLLC, 2015 ). Donald Trump was the most mentioned person on Twitter and Facebook, with over 4 million Twitter followers more than Clinton ( Stromer-Galley, 2014 ).

Following the footsteps, many other countries like Sweden ( Larsson & Moe, 2012 ; Strömbäck & Dimitrova, 2011 ), India ( Rajput, 2014 ; Pal, Chandra & Vydiswaran, 2016 ) and Pakistan ( Ahmed & Skoric, 2014 ; Razzaq, Qamar & Bilal, 2014 ) also made extensive use of SM and conducted successful campaigns in the history of recent politics. The research community used different data analysis and mining processes and found the hidden patterns from trillions of data gathered from SM. They analyzed user sentiment from the written text on a user’s profile. This behavioral study is called sentiment analysis (SA) ( Carlisle & Patton, 2013 ). Numerous election prediction algorithms were conducted using Twitter data based on sentiment analysis ( Carlisle & Patton, 2013 ; Rajput, 2014 ), and the rest are discussed in the later sections.

After the USA elections (2008, 2012, 2016) and the Pakistan elections in 2013, the role of social media in politics, based on sentiment analysis, has been widely studied and examined ( Carlisle & Patton, 2013 ; Wolfsfeld, Segev & Sheafer, 2013 ; Ahmed & Skoric, 2014 ; Razzaq, Qamar & Bilal, 2014 ; Safdar et al., 2015 ). During the research, a lot of election prediction was performed using Twitter data based on sentiment analysis ( He et al., 2019 ; Ahmed & Skoric, 2014 ; Razzaq, Qamar & Bilal, 2014 ; Bagheri & Islam, 2017 ; Wang et al., 2012 ; Younus et al., 2014 ; Kagan, Stevens & Subrahmanian, 2015 ; Nickerson & Rogers, 2014 ). Numerous studies explore the realm of social media prediction, opinion mining, and information network mining techniques to establish standardized approaches to assess the predictive capabilities and limitations associated with the information embedded within social media data ( Cambria, 2016 ; Kreiss, 2016 ; Mahmood et al., 2013 ).

The primary motivation of this study is to contribute to the existing body of scientific literature on sentiment analysis by focusing on its application in election prediction using Twitter data. This study aims to delve deeper into aspects that may have received limited attention in previous works. Through a systematic, comprehensive, and detailed method, this review offers a fresh perspective on the causal factors influencing temporal sentiment analysis in social media to stimulate further discussions and considerations for enhancing future studies in this domain. Furthermore, an integral part of our work, which we plan to expand in future research, is a practical evaluation of the applicability and reproducibility of existing and upcoming techniques. While these approaches exhibit impressive results, their practical implementation can be challenging. By offering insights into their potential limitations, we aim to provide a realistic outlook for their utilization.

There are generally three main levels of sentiment analysis: document level, sentence level, and aspect level sentiment analysis. Document-level sentiment analysis involves analyzing the overall sentiment of a document, such as a blog post or news article. Sentence-level sentiment analysis involves analyzing the sentiment of individual sentences within a document, while aspect-level sentiment analysis involves analyzing the sentiment expressed towards a specific aspect or feature of an entity, such as the battery life of a smartphone. Significant progress has been made in election prediction in the last two decades. This survey paper aims to examine the use of sentiment analysis for predicting election outcomes. Furthermore, it will identify research gaps and propose future research directions. The structure of the article continues as follows: the Literature Review section provides a theoretical framework by conducting a literature review to support the study. The methodologies employed are outlined in the Methodology section. The Results focuses on discussing the primary insights and results derived from the study. Finally, the Conclusion concludes the document by summarizing the main findings, limitations and highlighting potential areas for future research.

Theoretical framework

The exploration of election prediction using Twitter data and sentiment analysis has yet to be thoroughly examined within academia, indicating a need for an extensive survey of existing research in this domain. While some surveys have been conducted in the literature, they primarily focus on utilizing various social media platforms for election predictions, while others may be outdated or lack comprehensive coverage of all aspects related to election prediction using Twitter data. A recent survey ( Nayeem, Sachi & Kumar, 2023 ) has been done in this field where researchers presented the significant publications ever done to analyze election prediction using different social media platforms. Some articles have ( Baydogan & Alatas, 2022 ; Chakarverti, 2023 ; Baydogan & Alatas, 2021a ) focused on evaluating the performance of artificial intelligence-based algorithms for hate speech detection and presents a novel approach for automatically detecting online hate speech. Baydogan & Alatas (2021b) proposed the use of the Social Spider Optimization algorithm for sentiment analysis in social networks while Baydogan & Alatas (2018) explores the use of the Konstanz Information Miner (KNIME) platform for sentiment analysis in social networks. Another paper by Rodríguez-Ibánez et al. (2023) examined sentiment analysis’s existing methods and causal effects, particularly in domains like stock market value, politics, and cyberbullying in educational centers. The paper highlighted that the research efforts are not evenly distributed across fields, with more emphasis on marketing, politics, economics, health, etc. Yu & Kak (2012) surveyed the domains that can be predicted utilizing current social media data by presenting a comprehensive overview of the existing methods and data sources used in past papers to predict election results. Kwak & Cho (2018) presented a survey that explores the insights gained and limitations encountered when utilizing social media data. The paper further examined the approaches to overcome these limitations and proposed effective ways of using social media data to comprehend public opinion in electoral contexts. Bilal et al. (2019) presented a survey listing the methods and data sources used in past efforts to predict election outcomes. In  Rousidis, Koukaras & Tjortjis (2020) , the authors examined current and emerging areas of social media prediction since 2015, specifically focusing on the predictive models employed. It reviewed literature, statistical analysis, methods, algorithms, techniques, prediction accuracy, and challenges. But this paper concentrates on something other than a specific field like politics. In  Skoric, Liu & Jaidka (2020) , authors presented the results of a meta-analysis that examines the predictive capability of social media data using various data sources and prediction methods. The analysis revealed machine learning-based approaches outperform lexicon-based methods, and combining structural features with sentiment analysis yields the most accurate predictions. Kubin & Von Sikorski (2021) investigated the influence of social media on political polarization. The study highlighted a heavy emphasis on Twitter and American samples while noting a scarcity of research exploring how social media can reduce polarization. The work in Cano-Marin, Mora-Cantallops & Sánchez-Alonso (2023) provided an evaluation and classification of the predictive potential of Twitter. The paper identified gaps and opportunities in developing predictive applications of user-generated content on Twitter.

Methodology

A systematic literature review was conducted following a six-step guideline for management research ( Drus & Khalid, 2019 ) such as formulating the research questions, identification of necessary criteria for the study, potentially relevant literature retrieval, analyzing the relevant information gathered from the literature and the results of the review were reported. The current study addresses the following two questions:

  • 1. What approaches are proposed by the research community to analyze the role of SM especially Twitter in politics?
  • 2. How can we divide the research done in this area into different time-based intervals (eras)?   What are the main strengths and weaknesses of each era?
  • What are the main strengths and weaknesses of each era?

To address the research inquiries, we conducted a systematic literature review (SLR) following the guidelines provided by  Kitchenham (2004) and used it in many surveys in different fields. These guidelines emphasized the importance of identifying the need for the review, determining the relevant data sources, providing a comprehensive review process description, presenting the results clearly, and identifying research gaps to facilitate further investigation. To ensure the inclusion of recent and up-to-date methodologies employed by researchers, we collected a substantial corpus of 250 documents spanning from 2008 (after the launch of Twitter in March 2006) to March 2023.

To curate our dataset, we utilized multiple databases to filter publications based on publication dates. We extracted papers from the first three pages of the search results, ensuring a well-balanced dataset by prioritizing the most cited publications. We only selected the research papers, not surveys or reports. Through this SLR, we successfully analyzed 80 papers that conformed to our predefined criteria. The detailed stages are explained below. Figure 1 exhibits the visual representation of the methodology.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1517-g001.jpg

Stage 1: Screening

We collected 250 articles focusing on the elections of the USA 2008, Arab Spring 2010, USA 2012, Pakistan 2013, India 2014, USA 2016, and Pakistan 2018 from various databases such as IEEE, Springer, Emerald Insight, Science Direct, Scopus, and Association for Computing Machinery (ACM). The search criteria were based on keywords such as sentiment analysis, predicting election results and election prediction classification using social media, election prediction using sentiment analysis, election prediction using Twitter data, sentiment analysis using Twitter data, and social network analysis through sentiment analysis. The resultant articles were then analyzed based on the title and abstract of the articles. After the analysis, only those papers that directly correlated with election prediction and had valid digital object identifiers (DOI) were selected. In doing so, 162 articles were selected by the end of the screening process.

Stage 2: Eligibility analysis

After the screening process, publications related to analysis performed with other data sets than Twitter, like Facebook, surveys, and papers whose purpose was not to use Twitter as a predictive system, were excluded from our repository. So, the final number of eligible articles was further narrowed down to 80 papers as the study was based on election prediction using Twitter data. Thus, the resultant repository was quite reasonable for making conclusions and inferences about the impact of SM and different classification approaches performed on elections in various countries.

Pre-processing techniques

Pre-processing techniques are those technique(s) applied to the raw dataset to avail formatted, error-free dataset. The relevant algorithm(s) then use this processed dataset to achieve maximum accuracy with minimal deterioration in their otherwise smooth performance.

Stemming is a technique that refers to all variations of a word to its root word, such as ‘warming’, ‘warmest’, ‘warmed,’ and ‘warmer’, which will be stemmed from the word ‘warm.’ This method reduces the time and memory space by removing suffixes that have exactly matching meanings and stem. For sentiment analysis on the text data, every word should be represented by the stem rather than the word mentioned in the text ( Al-Khafaji & Habeeb, 2017 ).

Stop word removal

Stop words are the words that are useless within the raw dataset. These words do not provide helpful information in the data set, so they must be removed to save computation time, storage, and space and improve the algorithm’s efficiency. Most stop words are pronouns and helping verbs like is, of, the, to, and/or ( Al-Khafaji & Habeeb, 2017 ).

Tokenization

Tokenization is a method to split the words within a sentence. Each split character, word, or symbol is called a token. It is an appropriate method in text analysis ( Al-Khafaji & Habeeb, 2017 ). Like, [the president has worked well] will be tokenized into [the, president, has, worked, well] ( Wongkar & Angdresey, 2019 ). These tokens help identify a content’s intent which helps in sentiment or text analysis.

Election prediction approaches

This section classifies all the research papers into various approaches. The taxonomy of these approaches is presented in Fig. 2 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1517-g002.jpg

Statistical approach

The authors of Ibrahim et al. (2015) presented a new approach for predicting the Indonesian Presidential elections in 2014. Their approach collected data from Twitter and preprocessed it by removing usernames and website links. Furthermore, the Twitter buzzers ( Ibrahim et al., 2015 ) were eliminated using an automatic technique to collect data from real Twitter users and avoid unusual noise in it. The cleaned data was then subdivided into sub-tweets, labeled with a candidate’s name, and their sentiment polarity was computed. They further used mean absolute error (MAE) metric to evaluate the performance, resulting in an MAE of up to 61%.

In another study, Bansal & Srivastava (2018) introduced a novel method called Hybrid Topic Based Sentiment Analysis (HTBSA) for forecasting election results using tweets. The tweets were preprocessed using text formatting techniques, and then the topics were generated using the Biterm Topic Model (BTM). HTBSA was conducted based on the sentiments of topics and tweets, resulting in an 8.4% MAE ( Eq. (1) ).

Similarly, the authors presented Lexicon-based Twitter sentiment analysis for forecasting elections using emoji and N-gram features in 2019 ( Bansal & Srivastava, 2019 ). Unlike previous studies, sentiment polarity will be analyzed using non-textual data such as emoji(s). The authors gathered data from Twitter while restricting themselves to Uttar Pradesh (UP) geo-location. The data was cleaned from HTML tags, scripts, advertisements, stopwords, punctuations, special symbols, and white spaces. Duplicate tweets were also eliminated in the process. The refined data was then converted to bi-grams and tri-grams, followed by sentiment labeling. Simultaneously, emoji unicode was compared with developed n-grams, and its sentiment was labeled. Consequently, both sentiments were used to calculate election prediction. Another mathematical algorithm was presented in ( Nawaz et al., 2022 ) based on sentiment forecasting for Pakistan democratic elections. The authors manually annotated tweets to avoid implications of spammed data among the datasets. Then, aspects of filtered tweets were extracted, assigning grammatical forms to each word in the sentence. The gathered factors were associated with opinions using the semantic similarity measure RhymeZone ( Whitford, 2014 ). Once the association was done, the Bayesian theorem was applied, which classified tweets with 95% accuracy.

Ontology approach

The authors of Budiharto & Meiliana (2018) forecasted the Indonesian presidential election using tweets from presidential candidates of Indonesia based on a preprocessing algorithm. The tweets were processed with text formatting techniques, including stopwords in the Indonesian language and special character elimination. Once the tweets were refined, top words, favorite lines, and re-tweets were counted. Then the authors calculated the polarity of positive, negative, and neutral reviews. In Salari et al. (2018) , researchers proposed text and metadata analysis to predict Iran’s presidential elections in 2017. The text data were gathered in the Persian language from two different platforms: Telegram and Twitter messages. The data was then analyzed using various analyses: sentiment analysis of hashtags, sentiment analysis of posts using Lexipers ( Sabeti et al., 2019 ), time analysis, and several views and users of each message analysis (Telegram). The first two analyses were text analysis, while the others were metadata information analysis. In doing so, the model achieved 97.3% accuracy in predicting the presidential election.

Lexicon based approach

In 2019, Oyebode and Oriji conducted sentiment analysis to forecast Nigeria’s presidential election 2019 ( Oyebode & Orji, 2019 ). The data was extracted from Nairaland ( Nelson, Loto & Omojola, 2018 ) using a web scraping approach, and they were preprocessed with text cleaning techniques. The resultant data were fed to three lexicon-based classifiers (Vader ( Hutto & Gilbert, 2014 ), TextBlob, and Vader-Ext) and to train five machine learning classifiers, namely support vector machine (SVM), logistic regression (LR), multinomial naive Bayes (MNB), stochastic gradient descent (SGD), and random forest (RF). When the classifiers were evaluated, the proposed Vader-Ext outperformed all other classifiers as it resulted in an 81.6% accuracy rate.

Supervised learning approach

Machine learning (ML) is a field of software engineering that uses measurable procedures to enable computer systems to “learn” with information without being programmed explicitly. ML tasks are classified as supervised, unsupervised and deep learning. In 2010, the authors presented an automated method that evaluates sentiments via linguistically analyzed documents ( Pak & Paroubek, 2010a ). Those documents were trained for the NB classifier and tested using n-gram as a feature. In 2012, the authors of Wang et al. (2012) proposed a real-time election prediction system that analyzed the opinions of various users on Twitter. The opinions were anatomized and later used to train and test the NB classifier. A unique prediction model for Elections held in Pakistan in 2013 was presented in Mahmood et al. (2013) . A set of tweets were gathered according to predictive models, which were later cleaned and were used to train CHAID (chi-squared automatic interaction detector) decision tree (DT), SVM, and NB classifiers. When the classifiers were evaluated with test data, the CHAID decision tree dominated compared to SVM and NB ( Eq. (2) ).

In 2014, the authors of Razzaq, Qamar & Bilal (2014) proposed a prediction system that evaluated the power of election prediction on the Twitter platform. The authors gathered and preprocessed tweets by eliminating duplicate tweets, URLs, whitespaces, and manual labeling. Furthermore, the Laplace method and Porter Stemming avoided zero values. Processed training data was used to train RF, SVM, and NB classifiers. When the classifiers were tested, NB dominated SVM and RF. Jose & Chooralil (2015) introduced a novel election prediction model using word sense disambiguation. The data was acquired from Twitter which was then cleansed by removing usernames, hashtags, and special characters. The negation handling technique was also applied to enhance the classification accuracy further. Speech tagging and tokenization were done on refined tweets provided by a word sense disambiguation classifier for categorization. The classifier attained a 78.6% accuracy rate.

Tunggawan & Soelistio (2016) presented a predictive model for the 2016 US presidential election. They gathered data from Twitter, which went through simple filtration techniques such as URLs and candidate name removal to make the resultant data precise. In doing so, 41% of the data were eliminated. Then the data was labeled manually and fed to the NB classifier ( Eq. (3) ). The classifier predicted 54.8% accuracy. Sharma and Moh proposed a supervised election prediction method using sentiment analysis on Hindi Twitter data ( Sharma & Moh, 2016 ). In this method, raw Hindi Twitter data underwent a text-cleaning module that removed negated words, stopwords, special characters, emoticons, hashtags, website URLs, and retweet text. 2 supervised (NB, SVM) and one unsupervised (Dictionary based) algorithms were used for classification. The dictionary-based classifier evaluated the tweets with 34% accuracy, whereas NB and SVM classifiers were trained with 80% accuracy of the data. In contrast, the remaining 20% of the data was used for the evaluation purpose, which resulted in 62.1% and 78.4%, respectively.

Ramteke et al. (2016b) presented a two-stage election prediction framework using sentiment analysis using Twitter data and TF-IDF. Further, the data was labeled using hashtag clustering and VADER techniques. 80% of the labeled data was used for training, and the remaining 20% was used for testing the classification algorithm. An accuracy rate of 97% was achieved when the classifier was tested. Ceron-Guzman and Leon-Guzman presented a sentiment analysis approach on Colombia Presidential Election 2014 ( Cerón-Guzmán & León-Guzmán, 2016 ). Twitter data was cleaned and normalized in two stages: basic and advanced pre-processing. The data was stripped from URLs, emails, emoticons, hashtags, and special characters in the basic pre-processing stage. After then, the data was forwarded to the advanced pre-processing step, where lexical normalization and negation handling techniques were applied to refine the data further. Once the text was normalized, it was modified to a feature vector, which was later fed to the classifier. The labeled dataset was split into 80:20 ratios for training and testing classifiers. Overall, the classifier performed with 60.19% accuracy when evaluated on test data. Singh, Sawhney & Kahlon (2017) presented a novel method for forecasting US Presidential elections using sentiment analysis. After collecting data from Twitter, the authors implied a restriction to consider only one tweet per user. All duplicate tweets were removed to avoid interference affecting the method’s performance. Then unwanted HTML tags, web links, and special characters/symbols were removed from the data. The refined data was then used to train the SVM classifier. Once the classifier was trained, it was evaluated with the test data, i.e., to classify the polarity of the data, attaining a 79.3% accuracy rate.

In 2018, Bilal et al. (2018) presented a deep neural network application to forecast the electoral results of Pakistan 2018. They collected 56,000 tweets about the general elections in 2013 and treated them with text-cleaning techniques. The resultant data was then used to train Recurrent Neural Network (RNN). Once the RNN was trained, it was evaluated with test data resulting in an 86.1% prediction rate. In 2019, a new methodology was presented in Joseph (2019) , which predicted Indian general elections using a decision tree. Ruling and opposing parties’ data was gathered from Twitter. Stopwords, regular expressions, emojis, Unicode, and punctuation, were pruned from the data. The resultant data was then tokenized and fed to the decision tree classifier for training. Once the classifier was trained, it was evaluated, which resulted in 97.3% accuracy. An efficient method to forecast the Indonesian presidential election using sentiment analysis was presented in Kristiyanti & Umam (2019) . The authors collected data from Twitter, tokenized them, and generated Bi-grams(three-letter word combinations). Unlike previous research, the feature selection was made using the particle swarm optimization (PSO) algorithm and genetic algorithm (GA) algorithm separately. Then those features were used to train the SVM classifier. Once the classifier was trained, SVM with PSO performance dominated, against SVM with GA, with 86.2% accuracy rate.

Similarly, Oussous, Boulouard & Zahra (2022) proposed another Arabic sentiment analysis framework that forecasted the Moroccan general election 2021. The data was collected in Arabic from the Hespress website (a Moroccan news website). Then it was treated with text cleaning techniques (tokenization, normalization, and stop words removal) so that the dimensionality and processing time of the framework could be reduced. Term frequency (TF) was used to acquire feature vectors which were then passed on to several ML classification models such as SVM, NB, Adaboost, and LR for training. The classifiers predicted sentiment polarity with 94.35%, 62.02%, 87.55%, and 88.64% accuracy rates.

Deep learning approach

Hidayatullah, Cahyaningtyas & Hakim (2021) conducted sentiment analysis using a neural network to predict the Indonesian presidential election 2019. Two different datasets, i.e., before and after the elections, were collected and labeled using a pseudo-labeling technique. Then they were preprocessed with text-cleaning techniques, including case folding, word normalization, and stemming. This study trained three traditional ML classifiers (SVM, LR, and MNB) and five deep learning classifiers (LSTM, CNN, CNN+LSTM, GRU+LSTM, and bidirectional LSTM). Once the classifiers were ready, they were all evaluated by test data. SVM and bidirectional LSTM ruled better accuracy within their respective categories, but overall, bidirectional LSTM outperformed SVM by gaining an 84.6% accuracy rate. In another study, researchers presented a method for predicting USA presidential elections 2020 using social media activities ( Singh et al., 2021 ). The dataset was pretreated with text formatting techniques plus TF-IDF vectorization. They were then fed into NB, SVM, TextBlob, Vader, and BERT classification models for training and testing purposes. Compared to other classifiers, the BERT classifier prevailed with a 94% precision rate.

In Ali et al. (2022) , the authors introduced a deep learning model to forecast Pakistan general elections based on sentiment analysis. The dataset related to Pakistan general elections 2018 was gathered from Twitter, and they were labelled manually. Then the data was preprocessed with data transformation, tokenization, and stemming. Further, the proposed deep learning model was trained and evaluated with training and test data, respectively, resulting in a 92.47% prediction rate.

Research trends in algorithms and techniques of SM in politics: from beginning to date

Research objective 2: What are the research trends in algorithms and techniques of big social data in different time-based eras?

Era 1 (2010–2017)

This was the era where the research community were onset on innovating and developing election forecasting algorithms before actual election commencement. The prompt goal of their work was to predict and classify sentiments among digital text with optimal accuracy rate.

Pak & Paroubek (2010a) presented a novel method to forecast elections using sentiment analysis. Thus, the method predicted with 60% accuracy. In 2012, Wang et al. (2012) proposed an election prediction system focusing on US Presidential election 2012 data. Unigram features were extracted from 17,000 tweets (training dataset) and were fed to the NB classifier for training. Once the classifier was trained, it was evaluated with a 59% prediction rate. Mahmood et al. (2013) proposed an election prediction method that forecasted Pakistan General Election 2013 by assessing the CHAID decision tree. In doing so, the classifier performed with a 90% prediction rate. Razzaq, Qamar & Bilal (2014) presented a machine learning algorithm that predicted positive and negative sentiments with 70% accuracy. Thus, this method needed to be more consistent due to a biased data set. Ibrahim et al. (2015) proposed a statistical prediction method enthralled on Indonesian Presidential elections 2014. The method performed with 0.61 mean absolute error (MAE). There were several limitations to this method. Firstly, a dataset of all Indonesian voters across all Indonesian provinces should have been considered. Secondly, sentiment analysis (SA) cannot be performed when no keyword is present in candidate-related tweets. In 2015, Jose & Chooralil (2015) proposed an election prediction method using word sense disambiguation. Although the method performed with a 78.6% accuracy rate without the training phase, it was limited to negation handling and manual labeling.

In Tunggawan & Soelistio (2016) , the authors innovated a Bayesian election prediction system focusing 2016 US presidential election. Although the system boomed with an exceptional accuracy rate with model test data, it under-performed with a 54.8% accuracy rate when evaluated with test poll data. Similarly, Sharma & Moh (2016) presented an Indian election prediction system using the Hindi dataset. The tweets were preprocessed, and then the polarity of the resultant tweets was calculated. SVM achieved a 78% prediction rate. The system is curtailed with emoticon analysis and extensive training data. A sentiment analysis system was introduced in Cerón-Guzmán & León-Guzmán (2016) predicting Columbia presidential election 2014. It resulted in the lowest MAE of about 4%. In Singh, Sawhney & Kahlon (2017) , a sentiment analysis system was presented focusing on US presidential elections 2016. The system got trained with the Twitter processed data, and later, it was evaluated with test data, resulting in a 79% accuracy rate. A separate study ( Ceron, Curini & Iacus, 2015a ) examined the advantages of supervised aggregated sentiment analysis (SASA) on social media platforms to forecast election outcomes. Analyzing the voting intentions expressed by social media users during several elections in France, Italy, and the United States between 2011 and 2013, they compared 80 electoral forecasts generated through SASA alongside alternative data-mining and sentiment analysis approaches.

Era 2 (2018–2023)

This era is characterized as embarking of deep learning approach as it provided a major breakthrough and expedited state of the art results among its classification algorithms. It gave a new direction towards accuracy improvisation. In doing so, the research community focused on creating and developing new deep learning algorithms as compared to machine learning algorithms. Furthermore, this era also saw a novel sentiment classification proposals relying on statistical, lexicon and ontology approaches as seen in Fig. 3 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1517-g003.jpg

Bilal et al. (2018) introduced a deep neural networks (DNN) election prediction model that forecasted Pakistan General Elections 2018 with an 86.1% accuracy rate. Case sensitive tweets of tweets deteriorated the performance of the method. Kristiyanti & Umam (2019) proposed a sentiment analysis method to predict the Indonesian presidential election for 2019–2024. The system utilized particle swarm optimization (PSO) and genetic algorithm (GA) algorithms with SVM to improvise accuracy to 86.2%. Salari et al. (2018) presented an election prediction system for Iran presidential election 2017. Both text and metadata analysis of the tweets were considered to evaluate the system’s performance. The system performed with a 97.3% accuracy rate without the training phase. Another outcome prediction system, based on Indian general elections, was presented by Joseph (2019) , which trained and tested the DT classifier resulting in a 97% accuracy rate. Thus, this system works well with tweets in the English language only.

Chaudhry et al. (2021) proposed an election prediction method, mainly focusing on the US election 2020. They collected the Twitter data, preprocessed them, and extracted features using TF-IDF. Features of around 60% of the training dataset were used to train the NB classifier. In contrast, the features of the rest 40% dataset were utilized to evaluate the performance, resulting in a 94.58% accuracy rate. Thus, the authenticity of the dataset (tweets) was not examined, which hurt the method’s performance. Likewise, Xia, Yue & Liu (2021) proposed a sentiment analysis-based election prediction method for the same election campaign. The authors preprocessed the tweets with string replacement and stemming techniques in this method, followed by n-gram feature extraction. Multi-layer perceptron (MLP) classified 27,840 dimension features with an accuracy of 81.53%.

An election prediction method based on a deep learning approach was introduced in Hidayatullah, Cahyaningtyas & Hakim (2021) , which forecasted the Indonesian Presidential elections in 2019. The authors trained CNN, long-short term memory (LSTM), gated recurrent unit (GRU), and bidirectional LSTM, SVM, LR, and multinomial NB, from which bidirectional LSTM dominated against the rest by achieving 84.6% prediction rate. It also implied that, in comparison, DNNs attained a better accuracy rate than traditional machine learning algorithms. Similarly, another deep learning approach was implemented to forecast US presidential elections 2020 in Singh et al. (2021) . Three machine learning algorithms (SVM, NB, and TextBlob) and one deep learning algorithm (BERT) were trained and evaluated. As a result, the BERT algorithm attained the highest prediction rate of 94%. It denoted that DNN algorithms achieved better accuracy than conventional machine learning algorithms. Ali et al. (2022) introduced another DNN election prediction method focusing mainly on Pakistan General Elections 2018. The data were labeled manually, preprocessed, and later tokenized as usual. The resultant dataset was used to train and evaluate the DNN classifier, resulting in a 92.47% accuracy rate. Thus, the dataset used in the method above needed to be higher, due to which accuracy dropped. Previously, traditional polling data was widely considered the most reliable method for forecasting electoral outcomes. However, recent developments have revealed polling data’s potential incompleteness and inaccuracy. A study was conducted to compare the accuracy of polls with sentiment analysis results obtained from Twitter tweets ( Anuta, Churchin & Luo, 2017 ). The study analyzed a new dataset of 234,697 Twitter tweets related to politics, collected using the Twitter streaming API. The tweets underwent preprocessing, removing hashtags, links, and account names and replacing emotions and symbols with their complete form. The study’s findings indicated that Twitter exhibited a 3.5% higher bias in popular votes and a 2.5% higher bias in state results compared to traditional polls. Consequently, the study concluded that predictions based on Twitter data were inferior to those found on polling data ( Anuta, Churchin & Luo, 2017 ). The researchers highlighted the limitations of previous methods. They recommended incorporating additional techniques, such as POS tagging and sense disambiguation, during preprocessing and considering contextual and linguistic features of words to enhance prediction accuracy ( Anuta, Churchin & Luo, 2017 ).

In the context of the 2016 US elections, traditional techniques like polling were deemed unreliable due to the rapid evolution of technology and the prevalence of social and digital media platforms ( Hinch, 2018 ). A study analyzed slogans used in Twitter tweets during the elections, employing a WordCloud visualization. However, the analysis results could have been more consistent with the actual election outcomes, particularly in predicting Trump’s victory in Michigan and Wisconsin. The researchers emphasized the need to consider qualitative aspects when making electoral predictions, as the approaches employed in the study failed to capture the dynamics accurately.

The relationship between candidates’ social network size and their chances of winning elections was examined in a study that utilized data from Facebook and Twitter ( Cameron, Barrett & Stewardson, 2016 ). The study employed regression analysis and proposed three models, with the number of votes as the dependent variable and the number of Facebook connections and other factors as independent variables. The results indicated a significant correlation between the size of the social network and the likelihood of winning. However, the effect size was small, suggesting that social media data is predictive only in elections with close competition.

A study used social network techniques, such as volumetric analysis and sentiment analysis, to infer electoral results for Pakistan, India, and Malaysia ( Jaidka et al., 2019 ). The study collected approximately 3.4 million tweets using the Twitter streaming API and separated English tweets using a natural language toolkit. Volumetric analysis, measuring the volume of tweets for each party; sentiment analysis assessing positive and negative tweets; and social network analysis determining the centrality score of each party were employed. The study found that Twitter data was ineffective for making election predictions in Malaysia but proved effective and efficient for Pakistan and India. Incorporating multiple techniques, the proposed model was also effective for candidates and parties with fewer votes.

A study conducted in 2016 proposed a predictive model for forecasting the outcome of the US presidential elections based on an NB approach utilizing Twitter data ( Tunggawan & Soelistio, 2016 ). The researchers collected tweets from December to February, covering three months. The collected data underwent simple pre-processing techniques to prepare it for sentiment analysis. The resulting model achieved an impressive accuracy of 95.8% in sentiment prediction. A 10-fold cross-validation technique was employed to assess the model’s robustness. The F1 Score was used to evaluate the model’s accuracy in predicting positive sentiments, while F1 represented the model’s accuracy in classifying negative sentiments. The model’s accuracy in predicting negative sentiments ( Tunggawan & Soelistio, 2016 ). The authors of Heredia, Prusa & Khoshgoftaar (2018) introduced their sentiment analysis model, which classified the data with an accuracy of 98.5%. However, when the model’s predictions were compared with actual polling data, the results indicated an accuracy of 54.8%.

The statistical analysis of the included research papers revealed interesting insights regarding the distribution of publications across conferences and journals as shown in Fig. 4 . The data indicated that a substantial portion of the research publications were disseminated through conferences, accounting for 64% of the total publications. On the other hand, research papers published in reputable journals constituted 9% of the overall distribution as can be seen in Fig. 4a . This finding highlights the significance of conferences as platforms for rapid knowledge sharing and the enduring impact of journals in disseminating scholarly research.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1517-g004.jpg

The analysis further examined the distribution of research publications across different approaches. It was observed that a diverse range of approaches were employed across the reviewed papers. As seen in Fig. 4b , the data indicated that machine learning approach constituted the highest proportion accounting for 90% of the publications, followed by deep learning approach with 20%. This distribution showcases the varied methodologies utilized by researchers within the field and the prominence of certain approaches in contributing to the existing body of knowledge. To determine the years in which the authors exhibited a greater influence through their publications, an examination of publication trends from 2010 till 2022 was conducted and presented in Fig. 5 .

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-09-1517-g005.jpg

The analysis revealed that the years 2015 and 2022 had the most publications in this area. Articles written in the year 2023 are not included as the numbers might change by the end of the year. This shows that trend is increasing for this domain. Moreover, the paper analyzed the distribution of various categories of approaches in era 1 and era 2, representing different time frames of papers published on the topic. The aim was to investigate the evolution and trends of research methodologies and approaches used in the field over time. Fig. 3 presents the distribution of approaches across era 1 and era 2.

The analysis depicted that era 1 was dominated with machine learning algorithms but the shift started changing from era 2 to deep learning approach. Overall, the distribution of approaches in the field has undergone changes between era 1 and era 2, indicating an evolving research landscape. The shift towards deep learning and lexicon approaches suggests a diversification of research methodologies and a broadening of research interests in the field over time. These findings highlight the importance of understanding the temporal dynamics in research approaches and methodologies within the topic, providing valuable insights into the progression and development of the field.

Furthermore, the analysis identified the most relevant and highly cited journals in this domain as shown in Table 1 . Through a thorough examination of the citations within the reviewed papers, it was found that out of 42 journals, 15 contained articles with citations above a hundred. It was further seen that Journal of Social Science and Computer Review emerged as the most relevant and cited journal, followed by the journal, First Monday. These journals have consistently published influential research within the domain, indicating their significance as reputable outlets for disseminating scholarly work

JournalNumber of PublicationsCitations
Cyber Psychology, Behavior, and Social Networking1906
Mass Communication and Society1186
Political Research Quarterly1294
Journal of Broadcasting & Electronic Media1165
The International Journal of Press/Politics1267
First Monday11023
Association for Computational Linguistics1182
Communications of the ACM1177
Electoral Studies1263
Journal of Big Data3297
arXiv3257
Journal of Ambient Intelligence and Humanized Computing2110
Social Science Computer Review21049
European Journal of Communication2752
IEEE Intelligent Systems2178

Similarly, the analysis identified the most relevant and highly cited conferences in this domain as presented in Table 2 . The data showed that most papers were submitted to conferences on the web and social media, artificial intelligence, and on conferences on big data. The meta-analysis in this paper further helped in compilation of ten most highly cited articles, serving as a means to identify publications of significant research interest in Table 3 .

ConferenceNumber of PublicationsCitations
AAAI Conference on Web and Social Media44023
Conference on System Sciences3327
Conference on data mining and advanced computing2107
Proceedings of the workshop on semantic analysis in social media1376
AAAI conference on artificial intelligence1225
Conference on Language Resources and Evaluation14135
InProceedings of the ACL 2012 system demonstration1879
Conference on big data1118
Conference on inventive computation technologies1154
ReferencesYearFocus of StudyCitations
2010Sentiment Analysis On Election Tweets4135
2010Election Prediction With Twitter3646
2012Twitter Sentiment Analysis Of 2012 Us Presidential Election879
2011Election Prediction With Twitter660
20122011 Dutch Election Prediction With Twitter376
2016Twitter Sentiment Analysis Of 2015 Uk General Election263
2016Election Prediction With Twitter154
2018Twitter Sentiment Analysis Of Indonesia Presidential Election147
2016Election Prediction From Twitter Using Sentiment Analysis136
2016Sentiment Analysis On Hindi Twitter118

The paper has drawn comparisons between the findings of this research and the most relevant works in academia within the field. These comparisons aimed to situate the current study within the existing literature and highlight its contributions. The results align with previous studies that emphasize the importance of conferences and journals in disseminating research findings. Additionally, the prevalence of specific approaches identified in this research aligns with prior works that have identified and discussed these approaches in the literature.

This study holds several implications from theoretical, managerial, and practical standpoints. Theoretical implications include further validating and expanding existing theories and frameworks within the field, particularly in relation to the distribution of research publications and the prevalence of different approaches. The findings of this study contribute to the overall understanding of the research landscape and can serve as a basis for future theoretical developments and investigations.

From a managerial perspective, the results offer insights into the most influential years and the distribution of research approaches. This knowledge can assist managers and decision-makers in understanding the trends and dynamics of the field, enabling them to make informed decisions regarding resource allocation, collaboration opportunities, and strategic planning.

Practically, this research provides valuable guidance for researchers and scholars in terms of selecting appropriate publication outlets and identifying the prevailing approaches in the field. The identification of the most relevant and cited journals and conferences can aid researchers in targeting their work for maximum impact and visibility. Furthermore, knowledge of the most cited papers within the domain helps researchers stay abreast of seminal works and establish connections with influential researchers.

This study provides a detailed analysis of existing sentiment classification techniques in chronological order and categorizes them into statistical, lexicon, oncology, supervised, unsupervised, and deep learning approaches. It can be concluded that deep learning approach produced promising results. Despite that, deep learning constitutes new challenges such as high computational requirements and large dataset for training its models. The review paper further addresses the existing gap in the literature on election prediction using sentiment analysis of Twitter data. It contributes to the field by thoroughly analyzing existing studies, evaluating the effectiveness of sentiment analysis as a predictive tool, identifying challenges associated with this approach, and discussing the implications and future directions for research. By consolidating the findings, highlighting limitations, and suggesting potential advancements, this review is a valuable resource for researchers, practitioners, and policymakers interested in utilizing sentiment analysis to predict election outcomes and understand public opinion.

It has been analyzed that while there may be observed correlations between specific Twitter trends or sentiment patterns and election outcomes, it does not necessarily imply that these correlations indicate a causal relationship or direct influence on the election results. Merely correlating Twitter data and election results does not mean that the sentiment expressed on Twitter is causing the election outcome. Other factors, including traditional polling data, campaign strategies, socioeconomic factors, and voter behavior, may play more significant roles in determining the election outcome. Integrating multiple data sources and carefully considering other relevant factors to address this limitation is crucial. By doing so, researchers can mitigate this limitation and achieve a more accurate and comprehensive understanding of the dynamics underlying elections.

Moving forward, there are several areas where sentiment analysis for election prediction can be further scrutinized to enhance the efficiency and accuracy of classification algorithms. Incorporating additional data sources such as news articles, television transcripts, and survey data can provide a more comprehensive view of public opinion and enable the development of robust models that mitigate bias within extensive training data. Furthermore, improving sentiment analysis models to encompass diverse source data and exploring various aspects of the text, including sarcasm, subjectivity, and emotions, can contribute to predicting sentiment with higher precision.

Acknowledgments

We would like to extend our gratitude to Sheikh Bilal Ahmed for his assistance. We are grateful to all the anonymous reviewers for their useful comments.

Funding Statement

The authors received no funding for this work.

Additional Information and Declarations

The authors declare there are no competing interests.

Quratulain Alvi conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Syed Farooq Ali conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Sheikh Bilal Ahmed conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Nadeem Ahmad Khan analyzed the data, prepared figures and/or tables, and approved the final draft.

Mazhar Javed conceived and designed the experiments, authored or reviewed drafts of the article, review, and approved the final draft.

Haitham Nobanee performed the experiments, authored or reviewed drafts of the article, review, and approved the final draft.

IMAGES

  1. Application of Operations Research in Election Voting System

    research paper about election

  2. (PDF) Studying Elections in India: Scientific and Political Debates

    research paper about election

  3. (PDF) Elections

    research paper about election

  4. 2016 Presidential Election Research Paper

    research paper about election

  5. Voting, Campaigns, and Elections Research Paper Example

    research paper about election

  6. (PDF) Rethinking the Study of Electoral Politics in the Developing

    research paper about election

VIDEO

  1. 3 Ballot Features

  2. Election officer Jobs

COMMENTS

  1. Electoral Studies

    Electoral Studies is an international journal dedicated to the study of elections and voting in different parts of the world. With a reputation established over more than 35 years of publication, Electoral Studies is widely recognised as a major journal in the field. It publishes theoretically informed and empirically robust research on all ...

  2. What the science of elections can reveal in this super ...

    In a 2015 paper, researchers at Columbia University and Microsoft Research, both in New York City, showed that they could accurately predict the 2012 US presidential election results by using ...

  3. No evidence for systematic voter fraud: A guide to statistical claims

    No evidence for systematic voter fraud: A guide to statistical ...

  4. The Polls and the U.S. Presidential Election in 2020 …. and 2024

    1.1 Previous Work. There is a large literature that suggests that preelection polls affect election outcomes. Based on an experiment, Farjam (Citation 2021) discerned a substantial bandwagon effect, estimating that "after participants saw pre-election polls, majority options on average received an additional 7% of the votes." (Farjam also offers an extensive bibliography of papers about ...

  5. Electoral Integrity in the 2020 U.S. Elections

    As part of this research, EIP monitored the performance of American elections across 50 states after the 2014, 2016 and 2018 contests.4 Extending this series, this report summarizes the results of the new EIP expert survey monitoring the performance of the 2020 U.S. elections. The study (PEI-US-2020) was conducted among political scientists ...

  6. Political Elections: Articles, Research, & Case Studies on Political

    New research on political elections from Harvard Business School faculty on issues including marketing lessons from Trump's campaign, how to improve voter turnout, and voter registration costs and disenfranchisement. ... This paper highlights the motivations and consequences of citizens voting for lower-ranked candidates in elections held under ...

  7. Voting and winning: perceptions of electoral integrity in consolidating

    It shows that, on average, respondents become 1.39 points (dashed line) more trusting of elections on a 0-32 scale. Though small, the difference is statistically significant from 0 (t (2285) = 13.44, p < 0.001). Figure 1. Electoral trust in the pre- and post-electoral wave.

  8. U.S. Elections & Voters

    Black voters support Harris over Trump and Kennedy by a wide margin. About three-quarters of Black voters (77%) say they would vote for or lean toward Harris if the 2024 presidential election were held today. short readsAug 16, 2024.

  9. Voting behavior as social action: Habits, norms, values, and

    Since the seminal work by Downs (1957) on the economic theory of democracy, there has been intensive research into electoral behavior and its outcomes (Cancela and Geys, 2016: 265).In particular, both the calculus of voting (Riker and Ordeshook, 1968) and the paradox of voting (Aldrich, 1993) have inspired research efforts to salvage the modeling of electoral participation by modifying ...

  10. Economic perceptions and voting behavior in US presidential elections

    This research also appears timely in the wake of both media and academic assessments that question whether our older models work in a more polarized America. Especially, in the context of the 2016 election and possibly in the 2020 election, there have been claims that other factors, such as race relations, are significant drivers of vote choice.

  11. Introduction: A Decade of Social Media Elections

    Social media has been a part of election campaigns for more than a decade. In this special issue, we combine longitudinal and cross-national studies of social media in election campaigns, expanding the time span as well as number of countries compared to former comparative studies. The four papers present examples of longitudinal studies ...

  12. Electoral fraud and the paradox of political competition

    This paper examines the degree to which the anticipated closeness of the election incites the manipulation of the ballot box, and how this relationship differs across levels of democracy. Additionally, this paper also addresses the possible asymmetry of this relationship, depending on whether the incumbent is leading or trailing in the race.

  13. Research

    Survey of the Performance of American Elections. The Survey of the Performance of American Elections is the only national survey of registered voters that is focused on understanding the voting process from the perspective of the voter. It was begun in 2008 and repeated in 2012, 2014, 2016, and 2020. Access the dataset and latest reports here.

  14. Do people want a 'fairer' electoral system? An experimental study in

    Introduction. Elections represent key moments for modern politics. Even for regimes that cannot be considered full democracies, holding elections has the potential to confer legitimacy and moderate dissent (Howard & Roessler 2006).In this regard, it is crucial that especially those who supported the losing side of the election are willing to comply with the election outcomes.

  15. Mapping Election Administration + Election Science

    To open up the conversation to a wider array of experts, we were delighted to host a two-day convening in September that brought together academics, election officials, policymakers, technologists, and advocates to discuss the current state of election administration and election science research. With drafts of these white papers as a common ...

  16. Election 2020

    Pastors Often Discussed Election, Pandemic and Racism in Fall of 2020. Among churches that posted their sermons, homilies or worship services online between Aug. 31 and Nov. 8, 2020, two-thirds posted at least one message from the pulpit mentioning the election. But discussion varied considerably among the four major Christian groups included ...

  17. Forecasting Elections: Voter Intentions versus Expectations

    Justin Wolfers and David Rothschild present a new model on voter intention that may predict election results better than traditional polls. In the 345 elections they analyze, the new model—based ...

  18. Democratic Backsliding in the World's Largest Democracy

    Erosion of trust in the honesty of elections and concomitant weakening of democratic institutions and practices are growing concerns in modern global politics. This paper contributes to the discussion by detecting and examining a rare electoral irregularity observed in 2019 general election in India - the incumbent party won ...

  19. One Nation, One Election: An Analysis of the Pros and Cons of

    This paper analyzes the potential impacts, both positive and negative, of implementing simultaneous elections across India, commonly referred to as "One Nation, One Election."

  20. Key things to know about election polls in the U.S.

    Confidence in U.S. public opinion polling was shaken by errors in 2016 and 2020. In both years' general elections, many polls underestimated the strength of Republican candidates, including Donald Trump. These errors laid bare some real limitations of polling. In the midterms that followed those elections, polling performed better. But many ...

  21. Winning big: The political logic of winning elections with large

    ABSTRACT. Politicians winning elections with large margins of victory, beyond what is necessary to win electoral contests,what we term "winning big", is a common, yet under-studied phenomenon across the world. Political economy models suggest that winning big is not an optimal allocation of scarce campaign resources in a SMP/FPTP electoral ...

  22. The Study of Election Campaigning

    Abstract. Election campaigns attract great attention from voters, media and academics alike. The academics, however, tend to focus their research on the electoral result and on societal and long-term political factors influencing that result. The election campaign — the event of great interest, which has at least some role to play in ...

  23. On the frontiers of Twitter data and sentiment analysis in election

    This survey paper aims to examine the use of sentiment analysis for predicting election outcomes. Furthermore, it will identify research gaps and propose future research directions. The structure of the article continues as follows: the Literature Review section provides a theoretical framework by conducting a literature review to support the ...