Recommendations for applying tests of equivalence

Affiliation.

  • 1 Department of Psychology, York University, Toronto, ON, Canada. [email protected]
  • PMID: 14692005
  • DOI: 10.1002/jclp.10217

Researchers in psychology reliably select traditional null hypothesis significance tests (e.g., Student's t test), regardless of whether the research hypothesis relates to whether the group means are equivalent or whether the group means are different. Tests of equivalence, which have been popular in biopharmaceutical studies for years, have recently been introduced and recommended to researchers in psychology for demonstrating the equivalence of two group means. However, very few recommendations exist for applying tests of equivalence. A Monte Carlo study was used to compare the test of equivalence proposed by Schuirmann with the traditional Student t test for deciding if two group means are equivalent. It was found that Schuirmann's test of equivalence is more effective than Student's t test at detecting population mean equivalence with large sample sizes; however, Schuirmann's test of equivalence performs poorly relative to Student's t test with small sample sizes and/or inflated variances.

Copyright 2003 Wiley Periodicals, Inc. J Clin Psychol.

Publication types

  • Comparative Study
  • Analysis of Variance
  • Monte Carlo Method
  • Psychological Tests / statistics & numerical data*
  • Psychology, Experimental / statistics & numerical data*
  • Psychometrics / statistics & numerical data*
  • Reproducibility of Results
  • Research / statistics & numerical data*

Equivalent statistics for a one-sample t -test

  • Published: 09 March 2022
  • Volume 55 , pages 77–84, ( 2023 )

Cite this article

equivalence hypothesis in psychology

  • Gregory Francis 1 &
  • Victoria Jakicic 1  

3995 Accesses

3 Citations

1 Altmetric

Explore all metrics

Recent insights into problems with common statistical practice in psychology have motivated scientists to consider alternatives to the traditional frequentist approach that compares p -values to a significance criterion. While these alternatives have worthwhile attributes, Francis ( Behavior Research Methods , 40 , 1524–1538, 2017 ) showed that many proposed test statistics for the situation of a two-sample t -test are based on precisely the same information in a given data set; and for a given sample size, one can convert from any statistic to the others. Here, we show that the same relationship holds for the equivalent of a one-sample t -test. We derive the relationships and provide an on-line app that performs the computations. A key conclusion of this analysis is that many types of tests are based on the same information, so the choice of which approach to use should reflect the intent of the scientist and the appropriateness of the corresponding inferential framework for that intent.

Similar content being viewed by others

equivalence hypothesis in psychology

Sample-size determination for the Bayesian t test and Welch’s test using the approximate adjusted fractional Bayes factor

Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data, equivalent statistics and data interpretation.

Avoid common mistakes on your manuscript.

Introduction

In traditional hypothesis testing of one sample mean, a null hypothesis indicates the absence of an effect by specifying a particular null value for the population mean, μ 0 . With that null hypothesis and some information about variation from a sample of data, one can predict the sampling distribution of the sample mean if the null hypothesis is true. Then, one can identify the probability that a random sample would be selected that produces a mean value more extreme than the observed mean. Typically, this probability, known as a p -value, is compared to a criterion value, α , so that if p < α , one concludes that the observed sample mean must be among the “rare” outcomes if the null hypothesis is true. Rare outcomes are unusual (by definition), so scientists infer that the null hypothesis is false. A strength of this approach is that, in ideal situations, it controls the Type I error rate (the probability of rejecting a true null hypothesis) because rare sample means that correspond to p < α occur with a probability of α .

The last decade has highlighted misuses (and abuses) of hypothesis testing (e.g., Kruschke, 2010 ; Simmons et al., 2011 ; Nuijten et al., 2016 ; García-Pérez, 2017 ). It has become clear that many scientists (perhaps unknowingly) engage in multiple testing, improper sampling, and improper reporting of results; and that these behaviors undermine the Type I error control provided by the hypothesis testing process. Recognition of these problems has highlighted long-running concerns about null hypothesis significance testing (Branch, 2014 ; Craig & Abramson, 2018 ; Gelman, 2017 ; Szucs & Ioannidis, 2017 ). One common suggestion has been to use other statistics in place of the p -value, partly because the p -value is deemed “unreliable” or too “noisy” to be of much use (e.g., Cumming, 2014 ). The journal Basic and Applied Social Psychology banned null hypothesis testing procedures (Trafimow & Marks, 2015 ) and instead now encourages reporting of descriptive statistics and standardized effect sizes. Other scientists have suggested to replace traditional hypothesis testing with Information Criterion model comparison methods (e.g., Glover & Dixon, 2004 ) or with Bayesian methods (e.g., Nathoo & Masson, 2016 ; Ortega & Navarrete, 2017 ; Rouder et al., 2009 ; Kruschke, 2010 ).

We see legitimate advantages to these alternative statistical approaches, and we encourage readers to investigate them and to apply them if appropriate. Nevertheless, we feel it is important to recognize the close relationships between different statistics. Francis ( 2017 ) showed mathematical equivalence for many statistics in the situation corresponding to a two-sample t -test. These statistics include: Values used for the traditional hypothesis test such as t - and p -values; standardized effect sizes, namely Cohen’s d and Hedge’s g ; confidence intervals for standardized effect sizes; differences of Information Criterion calculations; and a commonly used Bayes factor. These statistics are mathematically equivalent representations of the data set’s information, and the analysis methods associated with each statistic only differ in how the data set’s information is interpreted. In this paper, we show that similar relationships hold for statistics associated with a one-sample t -test. Provided one knows the sample size, the fifteen statistics in Table  1 give the same information about the data set, and one can mathematically transform any one of the statistics into another statistic. An online app to do the conversions is provided at http://psych.purdue.edu/gfrancis/EquivalentStatistics/index_oneSample.html .

Equivalence of statistics

For a one-sample t -test, we have a null hypothesis, given as H 0 : μ = μ 0 , where μ is a population mean and μ 0 is a specific value, and an alternative hypothesis, denoted as H A : μ ≠ μ 0 . (One can also consider directional alternative hypotheses.) In traditional frequentist hypothesis testing, the test statistic is derived from a sample of data as:

where \(\overline {X}\) is the mean of the sample, s is the standard deviation of the sample, and n is the sample size. We will show in the following sections how each of the other terms in Table  1 can be computed from the t -value and the sample size.

Without loss of generality, we assume a positive t -value, which we refer to as an “unsigned” t -value. This assumption does not lose generality because whether \(\overline {X}\) is greater than or less than μ 0 depends on the substantive meaning of the measurements (e.g., do larger values of \(\overline {X}\) correspond to “more” or “less” of something of interest?); and the meaning is not part of the data set. In particular, one could multiply each score in a population and in a sample by -1 and thereby change the sign of the t -value without changing the statistical inferences or interpretation of the data.

The p -value is the area under the null hypothesis sampling distribution more extreme than ± t . Thus, for a given sample size, the unsigned t statistic has a one-to-one relation with the p -value; and for a given p -value it is possible to compute the corresponding unsigned t -value. The same relation holds for p -values from one-tailed tests. To properly interpret a one-tailed hypothesis test, a scientist must know that the p -value is from a one-tailed test and the direction of the observed difference, so we assume this information is known.

Standardized effect sizes

For the case of a one-sample t -test, the population standardized effect size is:

where μ is the population mean, μ 0 is the value in the null hypothesis, and σ is the population standard deviation. For a set of data, the estimated standardized effect size is Cohen’s d :

which uses the estimated mean and standard deviation from the sample. The statistic d is what is called a “sufficient” statistic, which means it contains all the information about δ that is available in the data set. Knowing the full data set wouldn’t provide us with any more information about δ .

It takes only a bit of algebra to see that Eqs.  1 and  3 are closely related so that:

Thus, for a known sample size, a given value of t is also a sufficient statistic of δ ; and since a unique p -value corresponds to each unsigned t -value, the p -value is also a sufficient statistic of unsigned δ .

For small sample sizes, d tends to overestimate the population standardized effect size (Hedges, 1981 ), so sometimes scientists prefer to report Hedges’ g :

This equation is an (excellent) approximation of a more complicated formula involving gamma functions (Hedges, 1981 ), which are also just a function of the sample size. Goulet-Pelletier and Cousineau ( 2018 ) provide a nice review of standardized effect sizes and their estimates. Clearly, Hedges g is also uniquely determined by the value of t (and thus of p , up to the arbitrary sign), provided the sample size is known.

Confidence interval of a standardized effect size

Each endpoint of a confidence interval for a standardized effect size has a one-to-one relationship with the standardized effect size and, thus, to t - and p -values as well. This close relationship is because the sampling distribution of a standardized effect size is a non-central t distribution that depends only on the corresponding t -value and the sample size. There is no simple formula to compute the confidence interval, but numerical techniques are easy to apply. The computation is reversible, so if the sample size is known, then one has as much knowledge about the data set by knowing just the upper limit of a confidence interval as by knowing the d , t -, or p -value of the sample. Likewise, knowing d and the sample size means that one already has sufficient information to compute the confidence interval of d .

Post hoc power

Power refers to the probability that a null hypothesis would be rejected given a specified value for the population (often given as a standardized effect size). Post hoc power supposes that the value of the population matches what was reported in an original experiment. The power value is then the probability that a replication experiment with the same sample size and design would reject the null hypothesis. Computing power requires knowing the sample size and the standardized effect size. We saw above that d (or g ) can be computed from t or from p , so post hoc power can be directly computed from those terms [this relationship was previously noticed by Hoenig and Heisey ( 2001 )]. One gets slightly different values when using d or g , but the calculation is invertible (up to the sign of t , d , and g ); so knowing post hoc power for a data set allows computation of all the other statistics in Table  1 .

Log likelihood ratio

One approach to statistical inference is to compute the likelihood of observed data for different models. The preferred model is then the one with the higher likelihood for the data. In the case of log likelihood ratios, the goal is to determine which of two models best fits the sample data. When dealing with a one-sample t -test, it is possible to set up models similar to the null and alternative hypotheses. For the null model, one supposes that each subject will have an observed score, denoted as X i . This score could come from a null (sometimes called a “reduced”) model:

where μ 0 is the null hypothesized population mean and 𝜖 i is random noise from a Gaussian distribution, \(N(0, {\sigma _{R}^{2}})\) with an unknown standard deviation, which is estimated from the data. The likelihood of observed data [ X 1 , X 2 ,..., X n ] for the reduced (null) model is computed as

which is the product, across all data points, of the height of the Gaussian distribution with the corresponding mean and standard deviation of the reduced/null model. Here, the estimated standard deviation is the value that maximizes likelihood for the given mean, μ 0 , and the data points. This calculation is commonly known as the “population” formula for standard deviation, which is a biased estimator for the population standard deviation but generates the maximum likelihood values for the given data set. Even though the calculation of a t -value uses the unbiased estimate of standard deviation while maximum likelihood calculations use biased estimates of standard deviation, we show below that there is a simple formula connecting the statistics.

The alternative (full) model is more flexible in that it allows estimation of both the standard deviation and the mean. It assumes that each data point comes from a model:

where 𝜖 i is random noise from a Gaussian distribution, \(N(0, {\sigma _{F}^{2}})\) . The likelihood of data relative to this model is:

where \(\hat {\mu } = \overline {X}\) , the typical mean of the sample (which maximizes likelihood), and \(\hat {\sigma }_{F}\) is the standard deviation of the sample relative to the sample mean (again using the “population” formula to maximize likelihood).

To test which model best predicts the observed data, the natural logarithm is taken of a ratio of the two likelihood functions:

Note that the null model is a special case of (is nested within) the full model. The full model has two free parameters (mean and standard deviation) compared to the reduced model having one free parameter (standard deviation). Because of the full model’s increased flexibility, we will always find that L F ≥ L R , and thus, Λ ≥ 0. To determine which model to support based on the value of Λ, it is common practice to pick a criterion threshold, much like one does with other statistics (e.g. p -values).

The log likelihood ratio has a one-to-one relationship with the unsigned t -value (Kendall & Stuart, 1961 ). The log likelihood ratio for the one-sample t -test case is:

Thus, we can also compute the unsigned t statistic from Λ:

Thus, Λ provides exactly the same information about a data set as the other statistics in Table  1 . The relationships in Eqs. ( 12 ) and ( 13 ) are based on maximum likelihood estimates, but sometimes scientists use other statistics for computing likelihoods. In at least one such case, there is, again, a direct relationship between t -values and Λ (Cousineau & Allan, 2015 ) that follows a different formula.

Model selection using an information criterion

When using the log likelihood ratio to choose between models, one runs the risk of over-fitting the data by creating too complex of a model that ends up “predicting” data variation that is actually due to random noise. To quantify the risks associated with complex models, Akaike ( 1974 ) derived what is now called the Akaike Information Criterion ( AIC ), which takes into account the number of parameters in a model when judging which model fits the data set best. For a given model, the AIC value is:

where m is the number of parameters for the model and L is the likelihood of the observed data for that model. Smaller (more negative) values indicate better fit of the model to the data.

In order to determine which model fits the data better with this criterion, the AIC associated with the full (alternative) model is subtracted from the AIC associated with the reduced (null) model. For the situation corresponding to a one-sample t -test, the reduced (null) model has m = 1, while the full (alternative) model has m = 2. Thus, we compute:

If Δ A I C > 0, then the full model is preferred, and if Δ A I C < 0, then the reduced model is preferred. Sometimes researchers require a big enough difference (e.g., Δ A I C > 2 or Δ A I C < − 2) before claiming preference for one model.

The term on the far right of Eq.  15 indicates that Δ A I C for a one-sample t -test is easily computed from Λ. We already know that Λ can be computed from t and the sample size, so Δ A I C is based on precisely the same information as a t - or p -value.

For small samples, Δ A I C tends to favor complex models (i.e. those with more parameters). Hurvich and Tsai ( 1989 ) developed a formula that further penalizes complex models with small sample sizes and many parameters:

Much like Δ A I C , Δ A I C c is computed by taking the difference in A I C c for the reduced and full models. For the one-sample t -test case, the formula is:

When Δ A I C c < 0, then the data favors the reduced model, and when Δ A I C c > 0, the data favors the full model. As the sample size n gets large, Δ A I C c will converge to Δ A I C .

Schwarz ( 1978 ) developed another criterion for model selection called the Bayesian Information Criterion ( BIC ). For a model with m parameters, the BIC is:

Much like Δ A I C , we compute the difference in the BIC by subtracting the BIC for the full model from the BIC for the reduced model. Thus, for a one-sample t -test:

As before, if Δ B I C > 0, then the full model is preferred, and similarly, if Δ B I C < 0, then the reduced model is preferred.

As just shown, the A I C c and BIC can be computed from Λ, and Λ can be computed from t . Thus, these two model selection statistics are derived from the same information given by the other statistics in Table  1 .

JZS Bayes Factor

A Bayes factor computes likelihoods for null and alternative models, much like a likelihood ratio, but it computes mean likelihood across all model parameter values defined by a prior probability distribution. Rouder et al., ( 2009 ) argued that a convenient (both in terms of computability and in meeting the needs of practicing scientists) prior is based on a Cauchy distribution for a standardized effect size. This prior distribution is derived from priors placed on other parameters that were previously suggested by other authors, so it is called a Jeffreys, Zellner, and Siow (JZS) prior. For the case of a one-sample t -test, Rouder et al. showed that such a prior results in a relatively straightforward calculation of the JZS Bayes factor that depends only on the t -value and the sample size.

The calculation is invertible (up to an unknown sign term), and thus, there is a one-to-one relation between the JZS Bayes factor, the unsigned t -value, and the p -value. The broadness of the prior for the alternative hypothesis can be easily adjusted with a scale term, r . An often reasonable default value is \(r=\sqrt {2}/2\) , but the relationship between an unsigned t -value and the corresponding JZS Bayes factor holds for every scale term. Interpreting a Bayes factor ( BF ) requires knowledge of the properties of the prior, so the scale value should be known for any given situation.

We have shown that, provided the sample size n is known for a one-sample two-sided t -test, then the unsigned t -value, p -value, unsigned Cohen’s d , unsigned Hedges’ g , the limits of a confidence interval of unsigned Cohen’s d or Hedges’ g , post hoc power derived from Cohen’s d or Hedges’ g , the log likelihood ratio, Δ A I C , Δ A I C c , Δ B I C , and the JZS Bayes factor value give equivalent information about the data set. This equivalence means that debates over which statistic provides the “most” information about the data set should end, as they all provide the same mathematical information. Instead, progress should focus on how these statistics are interpreted. Since a p -value contains as much information about the data set as the statistics for other inferential frameworks, critiques of the p -value should either rest in the interpretation of the p -value or should apply to all of the statistics in Table  1 .

Inference in different frameworks

Since all the statistics in Table  1 convey the same information about a data set, how should a scientist chose between them? One answer is simple: Share the statistic that makes it easy for the reader to understand the information. For example, if an article describes a study with n = 27 and t = 2.18, it is redundant to also say that p = 0.04, d = 0.42, and the 95% confidence interval of d is (0.02, 0.81). Nevertheless, such redundancy can be helpful to readers by making explicit what is otherwise implicit information.

Still, one would hardly expect a paper to describe all fifteen statistics in Table  1 , and such reporting would generally not be appropriate because the different statistics derive meaning only within their respective inferential frameworks. Indeed, a key implication of the ability to transform the statistics in Table  1 is that various inferential frameworks are based on fundamentally different interpretations of common information. Here, we briefly summarize those interpretations for different frameworks so that readers can determine which framework is most appropriate for their particular research.

Frequentist hypothesis testing

The t - and p -values are part of traditional hypothesis testing approaches. The fundamental inference is whether to reject the null hypothesis ( p < α ). A key component of this decision making process is that it, in ideal settings, controls the Type I error rate (rejecting a true null hypothesis).

Standardized effect size

The d and g estimates of the standardized effect size, and their confidence intervals, provide information about the distance (in units of standard deviation) between the population mean and the value in the null hypothesis. This signal-to-noise ratio value is useful because it gives an indication of how easy it is to generate a sample that distinguishes between the presence or absence of an effect.

Power is most useful for planning an experiment. There is little point in running an experiment with low power. Post hoc power (whether based on d or g values) gives an estimate of the probability that a replication study with the same sample size will produce a similar decision outcome.

Model comparison methods

Rather than focus on controlling Type I error, model comparison methods try to identify which model best fits observed data. The most straightforward method is the (log) likelihood ratio, but it has the property that a more complicated model always has a better fit to a dataset than a simpler model. This property is unappealing because the complicated model will “fit” noise as well as signal in the dataset. Such overfitting will cause the more complicated model to poorly fit new data drawn from the same population because the new data set will have a different pattern of noise.

The Δ A I C and Δ A I C c statistics include complexity penalties in such a way that (in ideal situations) the preferred model will better predict new replication data. Given the importance of replication in scientific investigations (Earp and Trafimow, 2015 ; Zwaan et al., 2018 ), these statistics seem like they would be of interest to many scientists. One advantage of this approach compared to frequentist hypothesis testing is that it can provide support for the null model, compared to the alternative model.

There are some disadvantages to this approach. One disadvantage is that it does not control the Type I error rate, which many scientists feel is important. A second disadvantage is that, even with large sample sizes, the Δ A I C statistic is not guaranteed to select the correct model. Indeed, these methods tend to have a bias toward selecting models that are more complicated than reality.

The Δ B I C statistic addresses some of these concerns. In particular, if the true model is one that is being considered, Δ B I C will select it, at least for large enough sample sizes. Of course, if the true model is not among the candidates being compared, no model selection method can identify it.

Whether one prefers Δ A I C or Δ B I C partly depends on what kinds of models a scientist believes are being compared. If the scientist wants to identify a model that best predicts future data, then Δ A I C is the better choice, even though the resulting model may not be the true model (or even a good model). If the scientist believes that the true model is under consideration, then Δ B I C is the better choice because it will (for a large enough sample) almost surely be identified as the best model (Burnham & Anderson, 2004 ).

Bayes factors

The Bayes factor is also a model comparison statistic, but it has a different perspective than the information criterion methods. Whereas the information criterion methods use the current data to predict future data, the Bayes factor directly compares competing models with the current data. It does this by including a prior distribution of parameters that are part of the model. The prior represents uncertainty, which can be resolved by data. Similar to Δ B I C , a Bayes factor will (given a sufficiently large sample) identify the true model, if it is under consideration. In addition, being able to specify a prior allows the Bayes factor approach to compare complicated models that are not possible with Δ B I C or Δ A I C .

Relationships between inferential frameworks

Even though they have radically different inferential interpretations, all of the statistics in Table  1 are based on the very same information in the data set. Thus, for scientists not familiar with the statistics, it might be fruitful to consider how they are related to each other. We do this by considering values of statistics that correspond to commonly used inferential thresholds.

Figure  1 plots p -values as a function of sample size ( n ∈{10,…,400}). Each curve plots the p -values corresponding to a commonly used criterion for other inferential frameworks. Figure  1 A plots the p -values corresponding to criterion JZS BF values. The value B F = 3 is often taken to provide the minimal amount of support for the alternative hypothesis, while \(BF = \frac {1}{3}\) is often taken to provide the minimal amount of support for the null hypothesis. B F = 1 indicates equal support for either hypothesis. The JZS BF is more conservative than the p < 0.05 criterion in the sense that the p -values associated with B F = 3 for n > 10 are all less than 0.025. As sample size increases, even smaller p -values are needed to find minimal support for the alternative hypothesis. On the other hand, for \(BF = \frac {1}{3}\) , we find that the associated p -values for small sample sizes are quite large; however, the p -values do decrease as sample size increase. For large samples, what might be considered as “nearly significant” p -values are treated by the Bayes Factor as evidence for the null hypothesis.

figure 1

p -values that correspond to different criterion values for other inferential statistics. (A) p -values that correspond to JZS B F = 3, JZS B F = 1, and JZS \(BF = \frac {1}{3}\) . (B) p -values that correspond to Δ A I C = 0 and Δ A I C c = 0. (C) p -values that correspond to Δ B I C = 0, Δ B I C = 2, and Δ B I C = − 2. (D) p -values on log-log plots that correspond to the different criterion values included in (A), (B), and (C)

Figure  1 B plots the p -values against Δ A I C = 0 and Δ A I C c = 0, which separates support for the reduced and full models. In general, these information criterion are more lenient than using p < 0.05. Starting at n = 120, the p -value becomes bound between 0.15 and 0.16, so if a data set produces p < 0.16, then one could interpret that as providing better support for the alternative as compared to the null.

Figure  1 C plots the p -value against Δ B I C = 0, Δ B I C = 2, and Δ B I C = − 2, which are common criteria that give equal support to both models (Δ B I C = 0) or provide minimal evidence for the full model (Δ B I C = 2) or reduced model (Δ B I C = − 2). Δ B I C is more lenient with small sample sizes and more strict with larger sample sizes than the typical p < 0.05 criterion.

In Fig.  1 D, the p -values for the various inference criteria are plotted on a log-log graph. Much like what was found in Francis ( 2017 ) for the two-sample t -test, Δ A I C and Δ A I C c are the most lenient of the different criteria for indicating support for the alternative hypothesis, and the JZS BF is the most strict.

The graphs provided in Fig.  1 were created with “minimal” criterion values in mind. Whether a method is conservative or lenient with regard to supporting the alternative hypothesis (or rejecting the null) depends on somewhat arbitrary criteria that are not an inherent part of the inferential framework. Still, we think it is interesting to notice how the criteria from one framework map onto a different framework. This mapping may help scientists better understand the relationships between the frameworks and the impact of the criteria they use.

Conclusions

Over the past years, the field of psychology has sought ways to improve data analysis, the reporting of results, and the development of theories; and one common approach is to use new methods of statistical inference. However, as Table  1 indicates, many of the statistics commonly utilized are mathematically equivalent. This equivalence does not mean that the various analysis methods are exactly the same, and there are cases where the methods may disagree with each other given the same research method. For example, a sample of n = 200 with estimated d = 0.15 corresponds to t = 2.12 and p = 0.035, which indicates that a scientist should reject the null hypothesis. Consistent with that interpretation, the AIC model comparison approach gives Δ A I C = 2.47 and Δ A I C c = 2.43, which indicates that the alternative model will better predict future data than the null model. However, the JZS Bayes factor gives B F = 0.71 and Δ B I C = − 0.83, which is modest support for the null hypothesis. Thus, it is definitely not the case that all the statistics give the same answer, even though they are mathematically related to each other. Such disagreement is appropriate because the different inferential frameworks ask (and answer) different questions. When deciding how to interpret their data, scientists should carefully think about the questions being addressed by their statistics.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control , 19 , 716–723.

Article   Google Scholar  

Branch, M. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology , 24 (2), 256–277.

Burnham, K.P., & Anderson, D.R. (2004). Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods & Research , 33 , 261–304.

Cousineau, D., & Allan, T. (2015). Likelihood and its use in parameter estimation and model comparison. Mesure et evaluation en éducatioń , 37 (3), 63–98. https://doi.org/10.7202/1036328ar .

Craig, D.P.A., & Abramson, C.I. (2018). Ordinal pattern analysis in comparative psychology: a flexible alternative to null hypothesis significance testing using an observation oriented modeling paradigm. International Journal of Comparative Psychology , 31 , 1–21.

Cumming, G. (2014). The new statistics: Why and how. Psychological Science , 25 (1), 7–29.

Article   PubMed   Google Scholar  

Earp, B.D., & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology , 6 (621), 1–11.

Google Scholar  

Francis, G. (2017). Equivalent statistics and data interpretation. Behavior Research Methods , 40 , 1524–1538.

García-Pérez, M.A. (2017). Thou shalt not bear false witness against null hypothesis significance testing. Educational and Psychological Measurement , 77 (4), 631–662.

Gelman, A. (2017). The failure of null hypothesis significance testing when studying incremental changes, and what to do about it. Personality and Social Psychology Bulletin , 44 (1), 16–23.

Goulet-Pelletier, J.-C., & Cousineau, D. (2018). A review of effect sizes and their confidence intervals Part I: The Cohen’s d family. The Quantitative Methods for Psychology , 14 (4), 242–265. https://doi.org/10.20982/tqmp.14.4.p242 .

Glover, S., & Dixon, P. (2004). Likelihood ratios: a simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review , 11 , 791–806.

Hedges, L.V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics , 6 , 107–128.

Hoenig, J.M., & Heisey, D.M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician , 55 (1), 1–6.

Hurvich, C.M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika , 76 , 297–307.

Kendall, M.G., & Stuart, A. (1961). The advanced theory of statistics (Vol. 2). Hafner Publishing Company.

Kruschke, J.K. (2010). Bayesian data analysis. Wiley Interdisciplinary Reviews: Cognitive Science , 1 (5), 658–676.

PubMed   Google Scholar  

Nathoo, F.S., & Masson, M.E.J. (2016). Bayesian alternatives to null-hypothesis significance testing for repeated-measures designs. Journal of Mathematical Psychology , 72 , 144–157.

Nuijten, M.B., Hartgerink, C.H.J., van Assen, A.L.M., Epskamp, S, & Wicherts, J.M. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods , 48 , 1205–1226.

Ortega, A., & Navarrete, G. (2017). Bayesian hypothesis testing. An alternative to null hypothesis significance testing (NHST) in psychology and social sciences. In J. P. Tejedor (Ed.) Bayesian Inference. IntechOpen, https://doi.org/10.5772/intechopen.70230 .

Rouder, J.N., Speckman, P.L., Sun, D., Morey, R.D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review , 16 , 225–237.

Schwarz, G.E. (1978). Estimating the dimension of a model. Annals of Statistics , 6 , 461–464.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science , 22 (11), 1359–1366.

Szucs, D., & Ioannidis, J.P.A. (2017). When null hypothesis significance testing is unsuitable for research: a reassessment. Frontiers in Human Neuroscience , 11 , 390.

Article   PubMed   PubMed Central   Google Scholar  

Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology , 37 (1), 1–2.

Zwaan, R.A., Etz, A., Lucas, R.E., & Donnellan, M.B. (2018). Making replication mainstream. Behavioral and Brain Sciences , 41 , 1–50.

Download references

Author information

Authors and affiliations.

Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN, 47907-2004, USA

Gregory Francis & Victoria Jakicic

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Gregory Francis .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Francis, G., Jakicic, V. Equivalent statistics for a one-sample t -test. Behav Res 55 , 77–84 (2023). https://doi.org/10.3758/s13428-021-01775-3

Download citation

Accepted : 15 December 2021

Published : 09 March 2022

Issue Date : January 2023

DOI : https://doi.org/10.3758/s13428-021-01775-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bayes factor
  • Hypothesis testing
  • Model building
  • Parameter estimation
  • Find a journal
  • Publish with us
  • Track your research

APS

Equivalence Testing With TOSTER

  • Experimental Psychology
  • Methodology

Any science that wants to be taken seriously needs to be able to provide support for the null hypothesis. I often see people switching over from frequentist statistics when effects are significant to the use of Bayes factors to be able to provide support for the null hypothesis. But it is possible to test if there is a lack of an effect using p values. (Why no one ever told me this in the 11 years I worked in science is beyond me). It’s as easy as doing a t test, or, more precisely, as doing two t tests.

I’ve created my first R package, TOSTER (as in Two One-Sided Tests for Equivalence in R). Don’t worry, there is also an old-fashioned spreadsheet available as well (see “TOSTER Materials,” below).

Sometimes you perform a study where you might expect the effect to be zero or very small. So how can we conclude an effect is “zero or very small”?

One approach is to specify effect sizes we consider “not small.” For example, we might decide that effects larger than d = 0.3 (or smaller than d = –0.3 in a two-sided  t test) are “not small.” Now, if we observe an effect that falls between the two equivalence bounds of d = –0.3 and d = 0.3, we can act (in the good old-fashioned Neyman–Pearson approach to statistical inferences) as if the effect is “zero or very small.” It might not be exactly zero, but it is small enough.

We can use two one-sided tests to statistically reject effects ≤ –0.3 and ≥ 0.3. This is the basic idea of the TOST (two one-sided tests) equivalence procedure.

The idea is simple, and it is conceptually similar to the traditional null-hypothesis test you probably already use to reject an effect of zero. But whereas all statistics programs will allow you to perform a normal  t test, it is not so simple to perform a TOST equivalence test.

Psychological science really needs a way to show effects are too small to matter (see Morey & Lakens, 2016). So I made a spreadsheet and R package to perform the TOST procedure. The free TOSTER package is available from the Comprehensive R Archive Network (CRAN), which means you can install it using install.packages(“TOSTER”) .

Let’s try a practical example using the vignette that comes along with the R package.

Eskine (2013) showed that participants who had been exposed to organic food were substantially harsher in their moral judgments relative to those in the control condition ( d = 0.81, 95% confidence interval: [0.19, 1.45]). A replication by Moery and Calin-Jageman (2016, Study 2) did not observe a significant effect (control: n = 95, M = 5.25, SD = 0.95; organic food: n = 89, M = 5.22,  SD = 0.83). The authors used Uri Simonsohn’s recommendation to power their study so that they had 80% power to detect an effect that the original study had 33% power to detect. This is the same as saying: We consider an effect to be “small” when it is smaller than the effect size the original study had 33% power to detect.

With n = 21 in each condition, Eskine had 33% power to detect an effect of d = 0.48. This is the effect the authors of the replication study designed their experiment to detect. The original study had shown an effect of d = 0.81, and the authors performing the replication decided that an effect size of d = 0.48 would be the smallest effect size they would aim to detect with 80% power. So we can use this effect size as the equivalence bound. We can use R to perform an equivalence test:

install.packages("TOSTER") library("TOSTER") TOSTtwo(m1=5.25, m2=5.22, sd1=0.95, sd2=0.83, n1=95, n2=89, low_eqbound_d=-0.43, high_eqbound_d=0.43, alpha = 0.05) # Which gives us the following output: Using alpha = 0.05 Student's t-test was non-significant, t(182) = 0.2274761, p = 0.8203089 Using alpha = 0.05 the equivalence test based on Student's t-test was significant, t(182) = -3.026311, p = 0.001417168 TOST results: t-value 1 p-value 1 t-value 2 p-value 2 df 1 3.481263 0.0003123764 -3.026311 0.001417168 182 Equivalence bounds (Cohen's d): low bound d high bound d 1 -0.48 0.48 Equivalence bounds (raw scores): low bound raw high bound raw 1 -0.4291159 0.4291159 TOST confidence interval: Lower Limit 90% CI raw Upper Limit 90% CI raw 1 -0.1880364 0.2480364

You see, we are just using R like a fancy calculator, entering all the numbers in a single function. But I can understand if you are a bit intimidated by R. So, you can also fill in the same info in the spreadsheet.

Using a TOST equivalence procedure with α = .05 and without assuming equal variances (because when sample sizes are unequal, you should report Welch’s t test by default), we can reject effects larger than d = 0.48, t (182) = –3.03, p = .001.

The R package also provides a graph, displaying the observed mean difference (in raw scale units), the equivalence bounds (also in raw scores), and the 90% and 95% CIs. If the 90% CI does not include the equivalence bounds, we can declare equivalence.

equivalence hypothesis in psychology

Moery and Calin-Jageman concluded from this study: “We again found that food exposure has little to no effect on moral judgments.” But what is “little to no”? The equivalence test tells us the authors successfully rejected effects of a size the original study had 33% power to reject. Instead of saying “little to no,” we can put a number on the effect size we have rejected by performing an equivalence test.

If you want to read more about equivalence tests, including how to perform them for one-sample  t tests, dependent  t tests, correlations, or meta-analyses, you can check out a practical primer on equivalence testing using the TOSTprocedure I’ve written. It’s available as a preprint on PsyArXiv. The R code is available on GitHub.

Daniel Lakens will speak at the 2017 APS Annual Convention, May 25–28, 2017, in Boston, Massachusetts. He also will speak at the International Convention of Psychological Science, March 23–25, 2017, in Vienna, Austria.

TOSTER Materials

The TOSTER spreadsheet is available here .

The TOSTER R package can be installed from CRAN using install.packages(“TOSTER”) .

The practical primer on equivalence testing using the TOST procedure is available here .

The R code is available here .

Detailed example vignettes are available here .

Eskine, K. J. (2013). Wholesome foods and wholesome morals? Organic foods reduce prosocial behavior and harshen moral judgments.  Social Psychological and Personality Science ,  4 , 251–254. doi:10.1177/1948550612447114

Lakens, D. (2015). Always use Welch’s t-test instead of student’s t-test. Retrieved from http://daniellakens.blogspot.nl/2015/01/always-use-welchs-t-test-instead-of.html

Lakens, D. (2016a). Introduction to equivalence testing with TOSTER. Retrieved from https://cran.rstudio.com/web/packages/TOSTER/vignettes/IntroductionToTOSTER.html

Lakens, D. (2016b). TOST equivalence testing R package (TOSTER) and spreadsheet. Retrieved from http://daniellakens.blogspot.com/2016/12/tost-equivalence-testing-r-package.html

Lakens, D. (in press). Equivalence tests: A practical primer for t-tests, correlations, and meta-analyses. Social Psychological and Personality Science.

Moery, E., & Calin-Jageman, R. J. (2016). Direct and conceptual Replications of Eskine (2013): Organic food exposure has little to no effect on moral judgments and prosocial behavior.  Social Psychological and Personality Science ,  7 , 312–319. doi: 10.1177/1948550616639649

Morey, R. D., & Lakens, D. (2016). Why most of psychology is statistically unfalsifiable. Retrieved from https://raw.githubusercontent.com/richarddmorey/psychology_resolution/master/paper/response.pdf

About the Author

Daniel Lakens is an experimental psychologist at the Human-Technology Interaction Group at Eindhoven University of Technology, the Netherlands. His main lines of empirical research focus on conceptual thought and meaning. In addition, he writes applied articles on statistics and teaches a free “Improving Your Statistical Inferences” MOOC on Coursera. He can be reached at [email protected] .

equivalence hypothesis in psychology

Careers Up Close: Joel Anderson on Gender and Sexual Prejudices, the Freedoms of Academic Research, and the Importance of Collaboration

Joel Anderson, a senior research fellow at both Australian Catholic University and La Trobe University, researches group processes, with a specific interest on prejudice, stigma, and stereotypes.

equivalence hypothesis in psychology

Experimental Methods Are Not Neutral Tools

Ana Sofia Morais and Ralph Hertwig explain how experimental psychologists have painted too negative a picture of human rationality, and how their pessimism is rooted in a seemingly mundane detail: methodological choices. 

APS Fellows Elected to SEP

In addition, an APS Rising Star receives the society’s Early Investigator Award.

Privacy Overview

Llewellyn  E. van Zyl Ph.D.

The Happy-Productive Worker Hypothesis: Factor or Fallacy?

Navigating the complexities of happiness and productivity.

Posted April 24, 2024 | Reviewed by Davia Sills

  • The "happy-productive worker" hypothesis is a fallacy which suggests that happy people are more productive.
  • Research found contradictory evidence that challenges the notion of a direct, reciprocal relationship.
  • Individual, organizational, and contextual factors significantly moderate how happiness impacts productivity.
  • The relationship between happiness and performance is not always bidirectional.

Source: Wiley Publications, used with permission

Yesterday, my colleagues and I received the Top Cited Paper Award (2022-2023) for our paper on a positive psychological intervention we implemented for healthcare students. I was also informed that two of my papers in the Journal of Positive Psychology (the highest-ranked journal in positive psychology) were in the top three most-read and most-discussed manuscripts in the journal for the 2023 publication cycle. On top of this, I published around 29 papers (and had about 33 others in review somewhere). These are such wonderful accolades and provide some recognition for the great work which my team and I did. Undoubtedly, the 2022-2023 academic year has been the most productive and successful period in my entire academic career .

However, despite these achievements and my high performance, I found myself to be deeply unhappy. This seeming contradiction—having attained such remarkable success while grappling with profound personal dissatisfaction—has led me to ponder a fundamental question: Why was I able to function at such a high level yet feel tremendously unhappy at the same time? And more importantly, why did this high level of performance fail to translate into happiness ?

These questions strike at the heart of the “happy-productive worker” hypothesis—an idea that has become one of the building blocks of positive psychology and organizational psychology. The core premise of this idea is that there exists a symbiotic, reciprocal relationship between an employee’s level of happiness and their work performance.

So why was this not the case for me? This seeming contradiction, having achieved some remarkable academic success while feeling deeply unhappy, provides a compelling case study to look into what the research actually says about this “happy-productive worker” hypothesis.

Source: Dall-E / Open AI

What Is the Happy-Productive Worker Hypothesis?

The idea that happier workers are more productive and that increased productivity leads to greater happiness has become a cornerstone of not only my discipline but also one of the main drivers in practice. The idea was popularized by Frank Landy (1985), who argued that there is a symbiotic, reciprocal relationship between employees’ level of happiness and their work performance. And it is an intuitively appealing concept that resonates with many managers and HR professionals.

The underlying logic seems sound—when employees feel content and fulfilled, they are more likely to engage enthusiastically in their work, exhibit creativity , and maintain high levels of motivation and resilience , all of which contribute to enhanced performance. And when these employees experience the positive outcomes associated with their increased productivity, such as recognition, advancement opportunities, and a sense of accomplishment, their overall happiness and job satisfaction are further bolstered.

Source: AI Generated / MS Copilot

Factors Moderating the Happiness-Performance Relationship

But if the happy-productive worker hypothesis is correct, then why does there seem to be an inverse relationship between my happiness and my productivity? It would seem that the unhappier I became, the better I performed, and the better I performed, the unhappier I got. Am I just an outlier—a statistical anomaly?

Well, not entirely! Although the idea is appealing, the empirical evidence supporting the claim has been mixed and even contradictory at times. Recent studies have revealed that the relationship between happiness and job performance is far more complex and nuanced than the simplistic, deterministic view often presented. As with many psychological phenomena, the relationship between happiness and job performance is likely moderated by various individual, organizational, and contextual factors.

  • On an individual level , factors like personality traits, personal life circumstances, job fit, and emotional self-regulation abilities can moderate how an employee’s experienced happiness connects to their work behaviors and performance. For example, highly conscientious individuals may find satisfaction in efficiently completing tasks, leading to higher productivity, while those with lower emotional stability may struggle to maintain happiness in stressful work environments.
  • Organizational factors such as culture, policies, and management practices also play a critical role in shaping employee happiness and job performance. A supportive and inclusive work environment that values employee well-being can contribute to higher levels of happiness, while factors like excessive workload, lack of autonomy, and poor management can undermine employee morale and lead to decreased happiness and performance.
  • Contextual factors like industry norms, economic conditions, and societal expectations can shape the relationship between happiness and job performance. In highly competitive or high-pressure industries, employees may feel compelled to prioritize productivity over their own well-being, leading to a potential mismatch between happiness and performance. Macroeconomic factors such as job insecurity and industry disruptions can also impact employees’ perceptions of their jobs and future prospects, influencing their happiness levels and performance outcomes.

Interestingly, research also shows that the relationship between happiness and performance is not always bidirectional. While happy employees may indeed be more productive in certain contexts, there are instances where heightened productivity may lead to burnout and even detract from overall well-being. Finally, the complexity of human motivation suggests that productivity can stem from various sources beyond happiness alone, including extrinsic incentives, intrinsic interest in the work itself, and a sense of duty or responsibility. Therefore, while fostering employee happiness is undoubtedly valuable for organizational success, the direct reciprocal link proposed by the happy-performing worker thesis may oversimplify the multifaceted nature of workplace dynamics .

equivalence hypothesis in psychology

So what does this all mean for you and me? Well, the realization that my academic performance has come at the cost of my personal well-being has been both humbling and enlightening. While the “happy-productive worker” hypothesis may hold true in certain contexts, my own experience and the latest research serve as a powerful testament to the complexities and nuances that underlie the relationship between happiness and performance. By understanding that various factors affect our happiness and performance, we can chart a path toward creating more sustainable practices that can generate perpetual energy!

Gutiérrez, Oscar Iván, Jean David Polo, Milton José Zambrano, and Diana Carolina Molina. "Meta-analysis and scientific mapping of well-being and job performance." The Spanish Journal of Psychology 23 (2020): e43.

Pérez-Nebra, A. R., Ayala, Y., Tordera, N., Peiró, J. M., & Queiroga, F. (2021). The relationship between performance and well-being at work: a systematic review of 20 years and future perspectives in Brazil. Revista Psicologia Organizações e Trabalho , 21 (2), 1535-1544.

Sender, G., Nobre, G. C., Armagan, S., & Fleck, D. (2022). In search of the Holy Grail: A 20-year systematic review of the happy-productive worker thesis. International Journal of Organizational Analysis , 29 (5), 1199-1224.

Ayala, Y., Peiró Silla, J. M., Tordera, N., Lorente, L., & Yeves, J. (2017). Job satisfaction and innovative performance in young Spanish employees: Testing new patterns in the happy-productive worker thesis—A discriminant study. Journal of Happiness Studies: An Interdisciplinary Forum on Subjective Well-Being, 18 (5), 1377–1401. https://doi.org/10.1007/s10902-016-9778-1

Landy, F. J. (1985). Psychology of Work Behavior. Dorsey Press.

Llewellyn  E. van Zyl Ph.D.

Llewellyn E. van Zyl, Ph.D. , is a professor of positive psychology at the Optentia Research Unit within the North-West University and is attached to the Eindhoven University of Technology.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Teletherapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Therapy Center NEW
  • Diagnosis Dictionary
  • Types of Therapy

March 2024 magazine cover

Understanding what emotional intelligence looks like and the steps needed to improve it could light a path to a more emotionally adept world.

  • Coronavirus Disease 2019
  • Affective Forecasting
  • Neuroscience
  • Connect with us:
  • X (Twitter)
  • CAS Newsletter | CASNews

Research Challenge Grant winners announced

  • Author By Deborah Fox
  • April 26, 2024

Eight research projects involving 16 faculty members are the winners of the inaugural Dean’s Research Challenge Grants. Proposals submitted this year were required to focus on one of two themes: “Equity” or “Environment.”

Established in 2023 to encourage innovative scholarly work, the College of Arts and Sciences committed $47,000 to this initiative.

“Supporting faculty research is a top priority,” said Dr. Heather Dillaway, dean of the College of Arts and Sciences. “In the College of Arts and Sciences, we view the University’s motto of ‘Gladly we learn and teach,’ as being directly intertwined with research excellence. The purpose of universities, in the purest sense, is to create knowledge and share knowledge. That means we must invest in both research and teaching.”

The 2023-2024 winning proposals are:

Awards: Equity

  • Michael Hendricks (POL) along with co-PIs Noha Shawki (POL), Joan Brehm (SOA) and Eric Peterson (GEO)- Cleaning up the river without clearing out the neighborhood: Floating gardens in the Chicago River
  • Dan Ispas (PSY) along with co-PI Alexandra Ilie (PSY)- Exploring equity in personnel selection: Investigating differential prediction and applicant reactions to personality tests
  • Maura Toro-Morn (SOA-LALS) along with co-PI Jim Pancrazio (LAN) – project aims to assess and capture the experiences of COBAS program participants
  • Susan Chen (ECO) –  gender and race disparities in the impact of COVID-19 on job market outcomes

Awards: Environment

  • Ben Wodika (BIO) along with co-PIs Vickie Borowicz , Matt Dugas , and Sydney Metternich – Sugar Creek Urban Ecology Area Committee Application: Do species matter? A test of the Functional Equivalence Hypothesis using Ground Beetles
  • Christopher Mulligan (CHE) – Next Generation analytical methods for on-site and on-demand environmental pollutant monitoring
  • Sudipa Topdar (HIS)- Lifeless Feathers, Masculinity, and Ecological Imperialism in the Late Colonial Himalayas (1800-1947)
  • Maochao Xu (MAT)-Cyberbullying Dynamics: A Predictive modeling exploration in college communities

Award recipients will deliver presentations about their research projects at a College of Arts and Sciences Research Symposium on Friday, October 11. Additional details about this event will be forthcoming.

Related Articles

  • Privacy Statement
  • Appropriate Use Policy
  • Accessibility Resources

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Hum Neurosci

Functional Equivalence of Imagined vs. Real Performance of an Inhibitory Task: An EEG/ERP Study

Santiago galdo-alvarez.

1 Department of Clinical Psychology and Psychobiology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain

Fidel M. Bonilla

2 Laboratory of Experimental Psychology, Faculty of Psychology, University El Bosque, Bogotá, Colombia

Alberto J. González-Villar

María t. carrillo-de-la-peña.

Early neuroimaging and electrophysiological studies suggested that motor imagery recruited a different network than motor execution. However, several studies have provided evidence for the involvement of the same circuits in motor imagery tasks, in the absence of overt responses. The present study aimed to test whether imagined performance of a stop-signal task produces a similar pattern of motor-related EEG activity than that observed during real performance. To this end, mu and beta event-related desynchronization (ERD) and the Lateralized Readiness Potential (LRP) were analyzed. The study also aimed to clarify the functional significance of the Stop-N2 and Stop-P3 event-related potential (ERPs) components, which were also obtained during both real and imagined performance. The results showed a common pattern of brain electrical activity, and with a similar time course, during covert performance and overt execution of the stop-signal task: presence of LRP and Stop-P3 in the imagined condition and identical LRP onset, and similar mu and beta ERD temporal windows for both conditions. These findings suggest that a similar inhibitory network may be activated during both overt and covert execution of the task. Therefore, motor imagery may be useful to improve inhibitory skills and to develop new communicating systems for Brain-Computer Interface (BCI) devices based on inhibitory signals.

Introduction

During the last decades, Brain-Computer Interface (BCI) communicating systems are being developed successfully for a variety of clinical (Mak and Wolpaw, 2009 ) and non-clinical (Blankertz et al., 2012 ) applications. These systems are based mostly on the assumption that the mental rehearsal of an action recruits the same neural mechanisms as its real performance. In particular, the simulation theory, also known as the functional equivalence hypothesis (Jeannerod, 2001 ), suggests that a similar cortical network, including primary areas, is involved during both mental practice of a movement and its overt execution.

The assumption of a functional equivalence challenges the classical hierarchical view of the motor system. Since Penfield and colleagues reported that stimulation of specific neurons in the primary motor cortex (M1) resulted in movements following a somatotopic representation (Penfield and Boldrey, 1937 ; Penfield and Rasmussen, 1950 ), it has been generally assumed that M1 plays the role of a pure executor receiving orders from superior motor centers. In support of this view, former neuroimaging studies on motor imagery confirmed that primary and secondary motor areas were recruited during motor execution, but only secondary areas showed activation during mental practice of the same movements (Roland et al., 1980 ; Decety et al., 1988 ). Thus, they concluded that M1 is not activated when motor output is absent.

However, since then, many studies have questioned the hierarchical assumption and provided support for the functional equivalence hypothesis. Thus, various fMRI studies reported that the same network, including M1, was activated in motor imagery (Ersland et al., 1996 ; Porro et al., 1996 ; Roth et al., 1996 ; Lotze et al., 1999 ; Gerardin et al., 2000 ; Stippich et al., 2002 ). In several of these studies, it became clear that this activation could not be explained by subtle motor activity, as trials showing any EMG activity were discarded (Lotze et al., 1999 ; Gerardin et al., 2000 ; Lafleur et al., 2002 ).

Additional support for this hypothesis stems from event-related potential (ERPs) studies using the motor imagery paradigm (Galdo-Alvarez and Carrillo-de-la-Peña, 2004 ; Carrillo-de-la-Peña et al., 2006 , 2008 ; Kranczioch et al., 2009 ; Hohlefeld et al., 2011 ). Although the EEG/ERP technique is characterized by a low spatial resolution, it provides a direct online measure of cortical activation and allows testing whether similar processes are taking place in the same temporal interval (Cohen, 2014 ; Luck, 2014 ). Several studies have claimed that one particular ERP component, the lateralized readiness potential (LRP), is generated in M1. The LRP is obtained from central electrodes and reflects the lateralized portion of motor ERPs. The main evidence for M1 as the source of this component is the inversion of polarity found for lower limb movements, as compared to hand movements. Brunia ( 1980 ) explained the inversion by the somatotopical distribution of the neurons on the M1: hands are represented in the lateral surface of precentral gyrus, whereas legs are represented in the medial surface. In addition, source reconstruction of LRP activity using EEG (Böcker et al., 1994a , b ) and MEG (Praamstra et al., 1999 ) dipole modeling is consistent with the activation of M1.

Galdo-Alvarez and Carrillo-de-la-Peña ( 2004 ) reported that the LRP was present, although with a smaller amplitude, during covert performance, a result that the authors interpreted as evidence for the activation of M1 during motor imagery. Further research (Carrillo-de-la-Peña et al., 2006 , 2008 ) confirmed this finding and provided evidence of functional equivalence of overt and covert actions; e.g., similar timing for simple and sequential or complex movements, inversion of polarity for lower limbs, and similar activation for hand selection. In fact, Hohlefeld et al. ( 2011 ) reported that overt and covert movements differed in stimulus processing at early stages of response selection, rather than in motor processing.

From a different perspective, several studies have explored how motor imagery affects EEG oscillations related to movement, i.e., mu and beta bands recorded over the somatosensory and motor areas. Consistent with this, a similar motor-related EEG pattern generally referred to as mu and beta event-related desynchronization (ERD) has been found during motor imagery and actual movement (Pfurtscheller et al., 2006 ; Stavrinou et al., 2007 ; Nam et al., 2011 ). The findings of numerous studies using Transcranial Magnetic Stimulation (TMS) also indicate that motor imagery significantly increases corticospinal excitability (Mizuguchi et al., 2009 ; Roosink and Zijdewind, 2010 ; Williams et al., 2012 ).

Overall, the data on ERPs, EEG dynamics and TMS during motor imagery provide support for the functional equivalence hypothesis. However, the above-mentioned studies analyzed selection, preparation or execution of simple motor responses. In natural situations, motor skills and actions require fine executive processing that involves coding strength, direction and other muscle parameters and also the ability to reset and inhibit ongoing performance. It would therefore be interesting to explore the brain electrical activity during the covert performance of inhibitory tasks.

The go/no-go and the stop-signal tasks are the paradigms most commonly used to study response inhibition, understood as the ability to suppress, withhold, delay or interrupt ongoing or planned actions. The stop-signal task explores inhibition of an already initiated response, i.e., action cancellation, and thus implies greater inhibitory pressure on response-related processes than the go/no-go paradigm (Swick et al., 2011 ). Two fronto-central ERP components have been associated with performance of the stop-signal task: Stop-N2, a possible index of the conflict between an initiated go response and the stop signal, and Stop-P3, a component whose interpretation is still open to debate. The P3 amplitude is larger in successful than unsuccessful stop (US) trials and in subjects with fast stop performances (requiring greater inhibitory activation; Dimoska et al., 2006 ), supporting its interpretation as an index of inhibitory efficiency. It has been suggested that the source of Stop-P3 may be in the premotor cortex, a region believed to be responsible for mediating stop-signal inhibition (Kok et al., 2004 ; Ramautar et al., 2006 ). Nevertheless, its latency appears to be too late to reflect the initial process of voluntary response inhibition, and it has thus been interpreted as an index of evaluation of the inhibitory process (Huster et al., 2013 ). It has been also suggested that in no-go and stop trials this positivity may be modulated by the lack of negative activity associated with motor preparation (Kok, 1986 ; Verleger et al., 2006 ).

Although the recording of brain activity during the covert performance of an inhibitory task could provide additional support for the functional equivalence hypothesis, as far as we know, there is only one study comparing actual and imagined performance of a stop-signal task (González-Villar et al., 2016 ). Using auditory stimuli as stop signals, they found similar Stop-N2, Stop-P3 and mu and beta ERD in mental essays and real performance of the task, but did not study the LRP as a possible index of M1 activation.

Thus, the main aim of the present study was to test whether covert performance of a stop-signal task produces the same pattern of motor-related EEG activity observed during real performance. To this end, mu and beta ERD and the LRP were obtained during both imagined and real performance of go and stop trials. A similar pattern on these indices during both conditions may support the general applicability of the functional equivalence hypothesis to tasks that exert increased executive control over motor performance, as the stop-signal task does.

An additional objective was to replicate the previous study, testing whether the ERP indices that characterize response cancellation (i.e., Stop-N2 and Stop-P3) are also present during the covert performance of the Stop-signal task, using visual stimuli both as targets and as stop signals. Specifically, the presence of Stop-P3 in the covert condition could provide indirect evidence on the activation of an inhibitory network during imagery.

The present study also attempted to clarify the functional meaning of Stop-N2 and Stop-P3. Comparison of ERP components (LRP, Stop-N2 and Stop-P3) produced in US, successful stop (SS) and Imagined Stop (IS) trials may shed some light on the role of motor execution or outcome correction processes in classical ERP inhibition indices.

Materials and Methods

A total of 18 students (5M, 13F) ranging from 19 to 32 years (mean = 20.89; SD = 1.72) participated voluntarily in the study. All were right-handed, according to the Edinburgh handedness inventory, and reported normal or corrected vision. None of them presented a history of neurological or psychiatric disorders, or drug abuse. Informed consent was received from all the participants, in accordance with the Declaration of Helsinki.

Stimuli and Apparatus

The primary task consisted of a choice reaction task in response to white arrows pointing to the left or the right side (stimulus duration: 500 ms; mean interval between stimulus onsets: 2100 ms), which indicated the hand that participants had to respond with. The start of each trial was indicated by the appearance of a fixation cross in the center of the screen. Then, the white arrows substituted the fixation cross. The arrow consisted of an arrowhead and a tail and had a size of 2.1° · 1.4° of visual angle. In 30% of trials, a red arrow (stop signal) indicated that subjects had to cancel the already prepared response.

The task was designed and presented using the STIM program (Neuroscan Labs). The stimuli were presented on a 15″ screen located at a distance of 100 cm from the subjects. Participants responded using a response box held in their hands.

Design and Procedure

Participants were seated comfortably in an armchair in a dimly lit, sound attenuated room. They were instructed to look at the fixation cross in the center of the screen and to press a button with their right or left thumb according to the direction indicated by the white arrow. They were informed that in some trials a red arrow might appear after the white arrow, indicating that the response should be canceled. Subjects were instructed to respond as quickly as possible to the white arrow and not to wait for the appearance of the stop signal. They completed some practice trials before the first block of experimental trials.

In the real condition, the time interval between the onset of go signals and stop signals was 300 ms in the first trial and was then changed according to the subject’s performance (ranging from 160 to 400 ms in 40 ms steps). The interval was altered using the staircase-tracking algorithm that adjusts the go-stop interval in a certain trial depending on the results of the previous stop trial (Band and van Boxtel, 1999 ). This algorithm produces a distribution around ½ of successful and ½ of unsuccessful response-inhibited trials. If the response in the previous stop trial was correctly inhibited, the interval between go and stop signals in the next stop trial was 40 ms longer, also increasing the difficulty of successful inhibition; if the subject responded in the previous stop trial, the interval between signals in the next stop trial was 40 ms shorter, in order to facilitate inhibition (Logan and Cowan, 1984 ).

In the imagined condition, subjects were instructed to imagine as vividly as possible responding with the hand of the side pointed by the white arrow, and to withhold the response (like braking suddenly) when the stop signal appeared. They had to keep their hands on the response box, as in real performance. In this condition, due to the lack of response feedback, the Go-Stop signal interval was fixed at 300 ms.

The task for each condition consisted of 280 trials, 70% of them were Go (196 trials, 98 for each direction) and 30% Stop (84 trials, 42 for each direction). The order of the tasks was always the same: first, overt execution and then covert performance. This procedure was used to ensure more effective mental rehearsal after real practice, as revealed by previous studies (Cunnington et al., 1996 ; Carrillo-de-la-Peña et al., 2006 ). Participants were allowed a 5 min rest between both tasks.

Psychophysiological Recording and Data Analyses

The EEG was recorded from 28 electrode sites (10–20 international system) referenced to the left and right mastoids, using pure tin electrodes attached to a fabric cap (Electro-Cap International, Inc., Eaton, OH, USA). The electrooculogram (EOG) was recorded from sites above and below the left eye and from electrodes lateral to each eye. The AFz electrode served as ground electrode. Electrode impedances were kept below 10 kΩ. The EEG signals were digitized online with Neuroscan equipment (Neuroscan Laboratories, version 4.1), amplified 10,000 times (SynAmp Model 5083 amplifier), filtered using a band-pass between 0.1 and 100 Hz and a notch filter of 50 Hz, and sampled at a rate of 500 Hz.

The EEG data were analyzed using the EEGlab 12.02 toolbox (Delorme and Makeig, 2004 ). The data were resampled to 250 Hz and re-referenced to an average-reference. Poorly recorded channels were replaced by spherical-spline interpolation and EEG segments containing large ocular or other artifacts were rejected after visual inspection. The data were digitally filtered using a low-pass 30 Hz FIR filter. An Independent Component Analysis algorithm was used to remove components associated with ocular artifacts. The EEG data used for the ERP analyses were baseline corrected from −200 to 0 ms. Epochs were extracted from 200 ms pre-stimulus to 900 ms post-stimulus, and were extracted time-locked to go stimuli (white arrows) and to the stop stimuli (red arrows; only for N2 and P3 analyses). The ERPs used to measure the N2 wave were filtered with a 2–12 Hz band-pass filter to avoid overlap with other ERP waves.

The stop-signal task is complicated by the fact that the activity to the stop stimuli overlaps with the activity evoked by the previous go signal. To resolve this, we subtracted the activity evoked by go trials from the ERPs obtained in stop trials. First, we calculated the percentage of SS and US trials for each subject, and this percentage was used to select go trials in the following way: if the participant had a 45% of US in all stop trials, the 45% of the fastest go epochs were used as the pool of trials to make the subtraction of the US minus Fast Go trials. The remaining 55% of the slowest go trials were used as the pool to make the subtraction SS minus Slow Go trials. A random go epoch (selected from its respective pool of Go epochs) was then assigned to each stop epoch. Finally, stop and go epochs were aligned by the go signal, and the subtraction was computed. This method was applied in previous studies (Kok et al., 2004 ; Ramautar et al., 2006 ).

The LRP was obtained by the average method proposed by Coles ( 1989 ), i.e., it was computed by subtracting ERP activity at C3 minus C4 for the right responses and C4 minus C3 for the left responses, and then averaging the resulting difference waveforms. This removes non-motor contribution from this index of lateralized activity associated with response preparation. LRPs were obtained for each trial (go, stop) and task (overt, covert). Also, the topographical distributions of LRPs were calculated using the method described by Praamstra and Seiss ( 2005 ), applying the average method to obtain LRP from each pair of contralateral electrodes (e.g., F3/F4, FC3/FC4…; only for go trials in both tasks).

Mean amplitudes were obtained for N2 (200–260 ms interval) and P3 (260–450 ms interval) at the FCz electrode site. As different numbers of trials were presented for the different conditions, mean amplitudes were measured instead of peak amplitudes to prevent confusion due to different signal-to-noise-ratios.

Time-Frequency Analysis was performed by convolving the EEG data with a family of complex Morlet wavelets ranging in frequency from 3 to 30 Hz in 27 linearly increasing steps, and with logarithmically increasing cycles, from three cycles at the lowest frequency to eight at the highest frequency. Power data obtained after convolution was baseline corrected by transforming the power change of each time-frequency pixel to dB, relative to the mean power in the baseline interval (−400 to −100 ms) of each frequency.

As the frequencies of interest here are more prominent around Rolandic areas, we first averaged spectrograms of C3 and C4 electrodes. For analysis of mu and beta oscillations, time-frequency windows were selected after averaging the spectrograms for Trial (go, stop) and Task (overt, covert) together, to avoid making assumptions about condition differences. We observed that mu band had two peaks at different latencies (at around 450 and 700 ms, respectively), and we therefore extracted two different windows (from 300 to 550 ms and from 600 to 900 ms) in the 9–13 Hz range. For the beta band, we extracted the mean power from 200 to 550 ms between 18 and 24 Hz.

Statistical Analysis

Behavioral and ERP parameters were analyzed by considering the available measures in the different conditions. Thus, given the lack of motor response in motor imagery conditions, we carried out t tests to examine differences in behavioral performance reaction times (RTs) between the overt go response and overt US trials.

In order to assess the possible existence of LRPs during covert motor performance, we carried out one-sample Wilcoxon tests for the mean of five consecutive windows of 50 ms each, with a step size of 10 ms between windows (i.e., each window had an overlap of 40 ms with the prior window), starting 40 ms before the peak latency (approximately 370 ms). If significant differences were found for all the windows, we could conclude that the waveforms deviated significantly from baseline and thus that LRPs were also present during mental rehearsal of movements in the different conditions of the task.

LRP mean amplitudes were measured in the 300–400 interval. The LRP onset latencies were determined using the jackknife procedure. Therefore, 18 different grand averages for each of the experimental conditions were computed by omitting one of the participants from each grand average. The onset was subsequently measured using the method proposed by Schwarzenau et al. ( 1998 ), which assumes that the onset of correct preparation corresponds to the intersection point of two straight lines, one fitted to the baseline and another to the rising slope of the LRP.

For the LRP, N2 and P3 mean amplitudes and the beta and mu ERD power, repeated-measures analysis of variances (ANOVAs) were carried out with two within-subject factors (Trial: go, stop; Task: overt, covert). In these analyses, overt response stop trials included only those trials in which successful inhibition was observed. Possible differences between tasks in go LRP topography were analyzed using a repeated measures ANOVA on LRP mean amplitudes (200–400 ms), with task (overt, covert), and electrode pair (F3/F4, FC3/FC4, C3/C4, CP3/CP4, P3/P4) as within-subject factors. The LRP onsets were subsequently analyzed by means of repeated-measures ANOVA with two within-subject factors (Trial: go, stop; Task: overt, covert). The F values in the latter case were corrected using the formula F = F/( n − 1) 2 , as recommended when performing the jackknife procedure for statistical analyses (Ulrich and Miller, 2001 ).

To clarify the effect of successful vs. unsuccessful performance of the stop-signal task, additional repeated-measures ANOVAs were carried out with the within-subject factor Performance (SS, US, IS) for the same parameters.

Behavioral Performance

Table ​ Table1 1 shows behavioral indices for go and stop trials (as means of left and right hand responses). For go trials, the data included percentages of hits, errors and missing responses, as well as RTs for hits and errors. For stop trials, the percentage of US trials and their RTs, as well as mean stop signal delay (SSD) values and stop signal reaction times (SSRTs) are provided 1 .

Behavioral parameters for the overt performance of the stop-signal task .

RTs, reaction times; US, unsuccessful stop trials; SSD, stop signal delay; SSRT, stop signal reaction time .

The percentage of US was about 50%, as expected given the use of the staircase tracking algorithm. RTs were faster in US trials than in go trials ( t = 5.8, p < 0.001).

Figure ​ Figure1 1 presents the LRP obtained in different pairs of electrode sites and the scalp distribution of the component. Figure ​ Figure2 2 presents the average waveforms of EMG, LRP and stimulus-locked components (N2, P3) obtained from go and stop trials in both overt and covert performance, as well as the scalp distribution for each component.

An external file that holds a picture, illustration, etc.
Object name is fnhum-10-00467-g0001.jpg

Lateralized Readiness Potential (LRP) time-locked to the go signal for each condition in different scalp locations. Plotted grand averages of Successful Stop (SS) and Unsuccessful Stop (US) were computed using 12 participants, while Go Real, Go Im and Imagined Stop (IS) were computed using 18 participants. Topography represents the mean LRP amplitude of all conditions from 200 to 400 ms.

An external file that holds a picture, illustration, etc.
Object name is fnhum-10-00467-g0002.jpg

(A) Rectified electromyogram (EMG) for each condition. It shows that no EMG activity was registered after stimulus presentation during the imagined task. (B) LRP time-locked to the go signal and the topographies of the shaded area. SS and US grand averages of the LRPs were computed using 12 participants, while Go Real, Go Im and IS were computed using 18 participants. Topographies were calculated using the method described by Praamstra and Seiss ( 2005 ). (C) Event-related potential (ERPs) for each task and condition at the FCz electrode site and their topographies in the windows selected to measure N2 and P3 components. Note that go trials were averaged time-locked to the go signal, while SS, US and IS were averaged time-locked to the stop signal and with go-stimulus ERPs subtracted.

One-sample Wilcoxon tests were performed to confirm the existence of LRPs in covert response trials. All comparisons revealed significant differences from 0, and therefore we can conclude that the LRP is present in motor imagery for both go and stop trials (Table ​ (Table2). 2 ). The mean values and standard deviations for all the ERP parameters measured, including LRP, are shown in Table ​ Table3 3 .

One-sample Wilcoxon tests for covert trials Lateralized Readiness Potential (LRP) amplitude .

*p < 0.05 **p < 0.01 ***p < 0.001 .

Mean and standard deviations (in parentheses) for the measured event-related potential (ERP) parameters and mu and beta eventrelated desynchronization (ERD) .

Note: LRP for Unsuccessful Stop (US) data were obtained from 12 participants; for the other parameters, EEG recordings from the 18 participants were used .

The repeated-measures ANOVA (Trial × Task) for LRP amplitude showed significant main effects of Trial ( F (1,17) = 22.4; p < 0.001) and Task ( F (1,17) = 9.3; p = 0.007), but no interaction effect ( F (1,17) = 3.0; p = 0.1). The LRP amplitude was larger in go than in stop trials, and it was larger when the participants had to perform an overt response task than when they had to imagine the response.

In the analysis of go LRP topography, the ANOVA revealed significant effects for Electrode ( F (4,68) = 12.2; p < 0.001), Task ( F (1,17) = 9.9; p < 0.01), and for the interaction of both factors ( F (4,68) = 4.8; p < 0.01). Post hoc comparisons showed that LRP mean amplitude was significantly larger for overt than covert go trials only in fronto-central electrodes ( p < 0.01 for F3/F4; p < 0.001 for FC3/FC4; and p < 0.01 for C3/C4) but not in the posterior locations ( p = 0.081 for CP3/CP4 and p = 0.28 for P3/P4). In addition, topographical distribution was similar in both tasks (overt response task: central electrodes > rest of electrode sites except fronto-central electrodes, fronto-central electrodes > frontal and parietal electrodes, and central-parietal electrodes > parietal electrodes; covert response task: fronto-central and central electrodes > central-parietal > frontal and parietal electrodes).

The repeated-measures ANOVA to clarify the effect of successful vs. unsuccessful performance was applied to data from 12 participants, as six of the participants did not produce enough artifact-free US epochs for each hand to yield the LRP. The ANOVA revealed a significant effect of the factor ( F (2,22) = 6.5; p = 0.005), as LRP amplitudes were larger for US than for SS trials ( p = 0.031) and covert stop trials ( p = 0.033); however, no differences between the latter two conditions were found ( p = 1).

The repeated-measures ANOVA (Trial × Task) for LRP onset did not reveal any significant differences for Trial (Fc (1,17) = 0.1; p = 0.7), Task (Fc (1,17) = 0.05; p = 0.8) or the interaction between these factors (Fc (1,17) < 0.01; p = 0.9). The repeated-measures ANOVA with Performance as within-subjects factor did not show a significant effect for LRP onset ( N = 12) either (Fc (2,22) = 0.03; p = 0.9).

N2 Mean Amplitude

The repeated-measures ANOVA (Trial × Task) did not reveal any significant effect of Trial ( F (1,17) = 1.1; p = 0.3), Task ( F (1,17) = 0.8; p = 0.4) or the interaction between these factors ( F (1,17) = 0.1; p = 0.7).

The repeated-measures ANOVA showed a significant effect of Performance ( F (2,34) = 10.6; p ≤ 0.001). The N2 amplitude was larger for US than for SS trials ( p = 0.019) and covert stop trials ( p = 0.002); no differences were found between these two conditions ( p = 1).

P3 Mean Amplitude

The repeated-measures ANOVA (Trial × Task) revealed a significant effect of Trials ( F (1,17) = 11.3; p = 0.004). The P3 amplitude was larger in stop than in go trials. The ANOVA did not reveal significant effects of Task ( F (1,17) = 3.0; p = 0.1) nor the interaction between Trial and Task ( F (1,17) = 3.0; p = 0.1).

The repeated-measures ANOVA did not reveal a significant effect of the factor Performance ( F (2,34) = 1.1; p = 0.3).

Beta ERD (200–550 ms)

Figure ​ Figure3 3 shows the representation of the time-frequency analyses of both beta and mu ERD.

An external file that holds a picture, illustration, etc.
Object name is fnhum-10-00467-g0003.jpg

Time-frequency analyses. (A) Spectrogram showing the time-frequency power averaged across all conditions in the C3 and C4 electrodes. This plot was used to select time-frequency windows for statistical comparisons. (B) Mean mu (9–13 Hz) and beta (18–24 Hz) power for each task and condition–all time-locked to the go signal. As explained in the “Materials and Methods” Section, mu event-related desynchronization (ERD) presents two peaks (especially in stop trials), in both real and imagined performance. Shaded areas encircle the time intervals submitted to statistical analyses. Mu and beta ERD show a similar time course in covert and overt performance, although with a reduced power decrease in the former. (C) Topographies of power modulations in each shaded area and condition.

The repeated-measures ANOVA (Trial × Task) revealed a significant effect of Task ( F (1,17) = 20.6; p < 0.001). Beta desynchronization was larger for overt than for covert response trials. The ANOVA did not reveal a significant effect of Trial ( F (1,17) = 1.3; p = 0.3) or the interaction between these factors ( F (1,17) = 1.9; p = 0.2).

The repeated-measures ANOVA revealed a significant effect of Performance ( F (2,34) = 9.9; p < 0.001). A larger decrease in power was found in SS ( p = 0.001) and US ( p = 0.021) than in IS trials, but no differences were found between successful and US trials ( p = 1).

Mu ERD (300–550 ms)

The repeated-measures ANOVA revealed a significant effect of Task ( F (1,17) = 7.2; p = 0.016). Mu desynchronization was larger in overt than in covert response trials. The ANOVA revealed no significant effect of Trial ( F (1,17) = 0.1; p = 0.8) or the interaction between these factors ( F (1,17) < 0.001; p = 0.9).

The repeated-measures ANOVA revealed a significant effect of Performance on Mu ERD ( F (2,34) = 4.2; ɛ = 0.69; p = 0.041), although multiple pairwise comparisons (Bonferroni adjusted) did not reveal any significant differences.

Mu ERD (600–900 ms)

The repeated-measures ANOVA (Trial × Task) revealed a significant effect of Task ( F (1,17) = 15.4; p = 0.001). Mu desynchronization was larger for overt than for covert response trials. The ANOVA did not reveal a significant effect of Trial ( F (1,17) = 2.2; p = 0.15) or the interaction between these factors ( F (1,17) = 1.5; p = 0.2).

The repeated-measures ANOVA (Performance) revealed a significant effect of the factor ( F (2,34) = 11.9; p < 0.001). A larger decrease in power was observed in SS ( p = 0.004) and US ( p = 0.004) than in IS trials, but no differences were found between SS and US trials ( p = 0.5).

The main goal of the present study was to determine whether a similar pattern of motor-related brain electrical activity is shared in the overt and covert performance of the stop-signal task, a paradigm that exerts strong executive (inhibitory) control. To better capture the power and phase dynamics of the EEG, we included time/frequency analyses (mu and beta ERD) in addition to phase-locked averaged responses (i.e., ERPs).

The results of the present study indicate that covert performance of the stop-signal task appears to recruit neural mechanisms in the brain similar to those used during overt execution and with a similar time course.

The presence of lateralized preparatory activity at central electrodes in the motor imagery condition suggested that M1 is actively involved in the simulated performance of the task. Despite the low spatial resolution of EEG, it is generally considered that the neural source of the LRP component is located at the M1, as revealed by dipole estimation from EEG (Böcker et al., 1994a , b ) and MEG studies (Praamstra et al., 1999 ), and given its inversion of polarity depending on the limb that performs the movement (Brunia, 1980 ; Carrillo-de-la-Peña et al., 2006 ). The study findings also confirmed that the temporal pattern of activation is the same in covert and overt performance, as no difference was found in LRP onset between conditions.

It could be questioned whether our LRP results certainly support M1 activation during motor imagery. In fact, it has been argued that, depending on the physical setting of visual stimuli, LRP could reflect lateralized posterior activity rather than motor processing (Praamstra, 2007 ). In addition, with settings of asymmetric stimuli (as it is the case of arrows), other components related to attentional shifts, as the early directing-attention negativity (EDAN), the anterior directing-attention negativity (ADAN) and the late directing-attention positivity (LDAP; Verleger et al., 2000 ; Praamstra et al., 2005 ; Gherri and Eimer, 2010 ; Praamstra and Kourtis, 2010 ), or inhibitory mechanisms, as the N2cc component (Oostenveld et al., 2001 ; Praamstra and Oostenveld, 2003 ; Praamstra, 2006 ; Cespón et al., 2012 ) might also overlap with LRP.

Given that we did not use eccentric settings of stimuli (all were presented in the center of the screen), the contribution of lateralized brain activity associated to stimulus processing might be ruled out. The LRP scalp distribution, with maximal amplitudes between frontocentral and central electrode sites, and reduced amplitude towards more anterior and posterior sites is also inconsistent with reports of the topographical distribution of attention-shifts ERP waves, as EDAN, ADAN and LDAP. In addition, in a previous study using the same array of stimuli (arrows with the same tail and head sizes), we reported an inversion of polarity when the participants performed the task using feet movements (in both overt and covert trials; see Carrillo-de-la-Peña et al., 2006 ), an effect that supports the contribution of M1 in the generation of LRP (Brunia, 1980 ; Böcker et al., 1994a , b ). In any case, our results support that a similar brain network is involved in real and imagined inhibition, regardless of whether it is referred to M1 activation, activation of frontoparietal networks, or engagement of premotor inhibitory mechanisms.

The amplitude of the LRP was smaller in motor imagery than in the overt motor execution and inhibition, as consistently observed in previous studies (Galdo-Alvarez and Carrillo-de-la-Peña, 2004 ; Carrillo-de-la-Peña et al., 2006 , 2008 ). Although this might be interpreted as a sign of weaker motor activation in simulated performance, it is open to alternative explanations. As LRP was also smaller in stop trials than in go trials in the overt condition, it could be argued that the smaller LRP amplitudes in motor imagery are due to the presence of larger or sustained motor inhibition during the task. Alternatively, previous studies have also indicated that differences between overt and covert conditions may be due to stimulus processing (Hohlefeld et al., 2011 ) or the lack of feedback or control from somatosensory areas (Carrillo-de-la-Peña et al., 2008 ) rather than to motor activation processes.

Results of time-frequency analyses paralleled those found for LRP and provide a complementary view of the temporal dynamics of motor-related EEG in stop-signal tasks. As in previous studies (Pfurtscheller and Neuper, 1997 ; McFarland et al., 2000 ), we observed mu and beta ERD over the lateral central electrode sites during motor imagery; again, the decrease in power of those central rhythms was larger in overt performance. Although some studies have related the power of these bands to motor cortex activation, it has also been demonstrated that bilateral mu and beta ERD may be associated specifically with activation of the somatosensory cortex (Jurkiewicz et al., 2006 ).

In relation to the ERP components characteristic of the stop-signal task, we found that only P3 was significantly larger for stop than go trials, also in the simulated condition. The presence of Stop-P3 in the latter condition suggests that subjects actually canceled an already prepared response even during motor imagery. This result replicates a previous study with auditory stop signals that found similar P3 amplitude and midfrontal theta in imagined than in successfully stopped trials (González-Villar et al., 2016 ). As explained below, this finding has practical implications and contributes to understand the functional meaning of Stop-P3.

The inhibition of inappropriate responses is an important part of goal-oriented behavior. From a practical point of view, the observed involvement of similar neural circuits in the covert performance of the stop-signal suggests the possibility of training inhibitory skills through mental rehearsal. Non-invasive methods of recording brain signals, such as the EEG, are widely used in BCI. To date, only brain electrical activity indices of motor activation or stimulus detection have been used as BCI communicating systems. Our findings suggest that the indices of inhibition obtained in motor imagery could also be used as communicating systems and could be useful for developing hybrid BCIs that incorporate various sensing modalities in the brain (i.e., detection of directional movement and inhibition of that movement).

Previous studies have found larger N2 and P3 amplitudes for stop than for go trials. These modulations are usually interpreted as reflecting inhibitory control (De Jong et al., 1990 ; Dimoska et al., 2003 , 2006 ), although it has also been considered that N2 may reflect conflict detection (Carter et al., 1998 ; Nieuwenhuis et al., 2002 , 2003 ; Donkers and van Boxtel, 2004 ; Yeung et al., 2004 ; Enriquez-Geppert et al., 2010 ), and P3 the evaluation of the inhibitory process, because of its latency (Huster et al., 2013 ). Nonetheless, other differences between go and stop trials may contribute to the N2 and P3 modulations reported: first, a motor response, including muscular activation, is only present in go trials (and US trials); second, a stop signal is present only in stop trials, and therefore these trials involve double processing (go stimulus and stop stimulus) that may overlap. Thus, the functional significance of Stop-N2 and Stop-P3 is far from clear.

In the present study, two different experimental manipulations were carried out to clarify these alternative explanations: the inclusion of motor imagery to confirm/dismiss the role of motor execution processes (as no overt response is present during the mental essay of the stop-signal task), and the application of a procedure to remove go stimulus-linked activity from stop trials (see “Materials and Methods” Section).

It has been suggested that P3 in no-go trials may be due to the absence of movement-related negativity (Salisbury et al., 2004 ), and this could be extrapolated to Stop-P3. In the present study, no movement was present in either covert go or stop trials, but a prominent Stop-P3 appeared only in the latter. After comparing a press no-go and a count no-go condition, Smith et al. ( 2013 ) also concluded that P3 is due to motor inhibition related positivity in no-go trials. Thus, the presence of Stop-P3 during the imagery condition in the current study ruled out an interpretation based on differences in motor processes. The analysis of stop trials free from the influence of the go signal also allowed us to conclude that the larger amplitude of P3 in stop trials is not due to the summation of activity evoked by two consecutive stimuli.

In the present study, we failed to replicate the larger N2 to stop than to go trials reported in previous studies. However, in a comparison of Stop N2 in successful and US trials, Ramautar et al. ( 2006 ) found a larger N2 in unsuccessful trials and indicated that Stop N2 resembled an Error-Related Negativity. Our findings are consistent with this interpretation, as we observed larger N2 amplitude in US trials than in SS trials.

Despite the above contributions, there are some limitations in the experimental design; first, the role of M1 in inhibitory control remains unclear. Further research is required to establish whether M1 acts as a passive receptor of inhibitory signals from other components of the executive control network or assumes an active function in the suppression of motor processing. Since previous studies have considered beta rebound as a correlate of inhibition or return to an idling state after termination of a motor program (Neuper and Pfurtscheller, 1996 ), even after motor imagery (Pfurtscheller et al., 2005 ; Solis-Escalante et al., 2012 ), it would be interesting to analyze beta rebound in stop trials, what requires longer ISIs than the ones used in the present study. Our design was also unable to clarify whether Stop-P3 reflects actual inhibitory control or, alternatively, evaluation of the inhibitory process. As Huster et al. ( 2013 ) have argued, this process is initiated and controlled before the culmination of P3, suggesting that the component may reflect evaluation of the inhibitory outcome. Similarly, Wessel and Aron ( 2015 ) proposed use of the onset of the frontocentral P3 as a better indicator of response inhibition. Finally, we could not rule out the attentional effect produced by the red arrow (stop) in the N2 and P3 amplitudes. Future studies should include a condition in go trials with a second stimulus as a confirmatory signal (e.g., a green arrow to continue with the motor program).

Overall, the present findings add to previous cumulative evidence for the existence of a shared neural substrate between imagined and executed movements (Stavrinou et al., 2007 ), supporting the functional equivalence hypothesis (Jeannerod, 2001 ). The results provide a consistent picture: similar lateralized activity (LRP, mu and beta ERD) was observed both in overt and covert responses, with a similar time course (identical LRP onset, and mu and beta ERD temporal windows) and pattern of task-modulation (differences between go and stop trials). Thus, the results suggest that the mental imagery of a motor plan leads to activation of the same network, with similar temporal dynamics and constraints. The use for the first time of a motor imagery paradigm during performance of a stop-signal task allowed us to further conclude that a similar inhibitory network may be also active during covert execution of the task.

As stated above, this finding could contribute to the development of more sophisticated BCI and provides the scientific basis for understanding the efficacy of motor imagery techniques for improving performance in professional athletes (Jones and Stuth, 1997 ; Ridderinkhof and Brass, 2015 ) or motor rehabilitation in patients with neurological lesions (Dickstein and Deutsch, 2007 ; Zimmermann-Schlatter et al., 2008 ).

Author Contributions

SG-A was responsible for the first manuscript draft, manuscript editing and the statistical analyses, and contributed to literature review, and manuscript review. FMB contributed to task design, EEG recording and literature review. AJG-V was responsible for EEG processing and figures, and contributed to literature review and manuscript review. MTC-P was responsible for task design and contributed to statistical analyses, literature and manuscript review. All the authors contributed to interpretation of results.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This research was funded by Spanish Ministerio de Economía y Competitividad (Reference PSI2013-43594-R). AJG-V was supported by a research grant from the Fundación Ramón Dominguez. The authors would like to thank Fermín Pintos for his support developing a script for extracting the behavioral data.

1 The SSRTs represent the point at which the stop process finishes and can be estimated taking into account the go RT distribution and the observed probability of successful/unsuccessful inhibitions to the stop signal for a given SSD (go-stop interval). Using the staircase-tracking algorithm facilitates the estimation of the SSRT since that probability is around 0.50. Thus, it is possible to calculate SSRT by subtracting the observed mean SSD from the observed mean go RT (Logan and Cowan, 1984 ; Logan et al., 1997 ).

  • Band G. P. H., van Boxtel G. J. M. (1999). Inhibitory motor control in stop paradigms: review and reinterpretation of neural mechanisms . Acta Psychol. 101 , 179–211. 10.1016/s0001-6918(99)00005-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Blankertz B., Tangermann M., Müller K. R. (2012). “ BCI applications for the general population ,” in Brain-Computer Interfaces: Principles and Practice , eds Wolpaw J., Wolpaw E. W. (New York, NY: Oxford University Press; ), 363–372. [ Google Scholar ]
  • Böcker K. B. E., Brunia C. H. M., Cluitmans P. J. M. (1994a). A spatio-temporal dipole model of the readiness potential in humans. I. Finger movement . Electroencephalogr. Clin. Neurophysiol. 91 , 275–285. 10.1016/0013-4694(94)90191-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Böcker K. B. E., Brunia C. H. M., Cluitmans P. J. M. (1994b). A spatio-temporal dipole model of the readiness potential in humans. II. Foot movement . Electroencephalogr. Clin. Neurophysiol. 91 , 286–294. 10.1016/0013-4694(94)90192-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brunia C. H. M. (1980). “ What is wrong with legs in motor preparation? ” in Motivation, Motor and Sensory Processes of the Brain: Electrical Potentials, Behaviour and Clinical Use , eds Kornhuber H. H., Deecke L. (Amsterdam: Elsevier; ), 232–236. [ PubMed ] [ Google Scholar ]
  • Carrillo-de-la-Peña M. T., Galdo-Alvarez S., Lastra-Barreira C. (2008). Equivalent is not equal: primary motor cortex (MI) activation during motor imagery and execution of sequential movements . Brain Res. 1226 , 134–143. 10.1016/j.brainres.2008.05.089 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carrillo-de-la-Peña M. T., Lastra-Barreira C., Galdo-Alvarez S. (2006). Limb (hand vs. foot) and response conflict have similar effects on event-related potentials (ERPs) recorded during motor imagery and overt execution . Eur. J. Neurosci. 24 , 635–643. 10.1111/j.1460-9568.2006.04926.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carter C. S., Braver T. S., Barch D. M., Botvinick M. M., Noll D., Cohen J. D. (1998). Anterior cingulate cortex, error detection and the online monitoring of performance . Science 280 , 747–749. 10.1126/science.280.5364.747 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cespón J., Galdo-Álvarez S., Díaz F. (2012). The Simon effect modulates N2cc and LRP but not the N2pc component . Int. J. Psychophysiol. 84 , 120–129. 10.1016/j.ijpsycho.2012.01.019 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cohen M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. Cambridge, MA: MIT Press. [ Google Scholar ]
  • Coles M. G. (1989). Modern mind-brain reading: psychophysiology, physiology and cognition . Psychophysiology 26 , 251–269. 10.1111/j.1469-8986.1989.tb01916.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cunnington R., Iansek R., Bradshaw J. L., Phillips J. G. (1996). Movement-related potentials associated with movement preparation and motor imagery . Exp. Brain Res. 111 , 429–436. 10.1007/bf00228732 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Decety J., Philippon B., Ingvar D. H. (1988). rCBF landscapes during motor performance and motor ideation of a graphic gesture . Eur. Arch. Psychiatry Neurol. Sci. 238 , 33–38. 10.1007/bf00381078 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • De Jong R., Coles M. G. H., Logan G. D., Gratton G. (1990). In search of the point of no return: the control of response processes . J. Exp. Psychol. Hum. Percept. Perform. 16 , 164–182. 10.1037/0096-1523.16.1.164 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Delorme A., Makeig S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis . J. Neurosci. Methods 134 , 9–21. 10.1016/j.jneumeth.2003.10.009 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dickstein R., Deutsch J. E. (2007). Motor imagery in physical therapist practice . Phys. Ther. 87 , 942–953. 10.2522/ptj.20060331 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dimoska A., Johnstone S. J., Barry R. J. (2006). The auditory-evoked N2 and P3 components in the stop-signal task: indices of inhibition, response-conflict or error-detection? Biol. Cogn. 62 , 98–112. 10.1016/j.bandc.2006.03.011 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dimoska A., Johnstone S. J., Barry R. J., Clarke A. R. (2003). Inhibitory motor control in children with attention-deficit/hyperactivity disorder: event-related potentials in the stop-signal paradigm . Biol. Psychiatry 54 , 1345–1354. 10.1016/s0006-3223(03)00703-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Donkers F. C. L., van Boxtel G. J. M. (2004). The N2 in go/no-go tasks reflects conflict monitoring not response inhibition . Brain Cogn. 56 , 165–176. 10.1016/j.bandc.2004.04.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Enriquez-Geppert S., Konrad C., Pantev C., Huster R. J. (2010). Conflict and inhibition differentially affect the N200/P300 complex in a combined go/nogo and stop-signal task . Neuroimage 51 , 877–887. 10.1016/j.neuroimage.2010.02.043 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ersland L., Rosén G., Lundervold A., Smievoll A. I., Tillung T., Sundberg H., et al.. (1996). Phantom limb imaginary fingertapping causes primary motor cortex activation: an fMRI study . Neuroreport 8 , 207–210. 10.1097/00001756-199612200-00042 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Galdo-Alvarez S., Carrillo-de-la-Peña M. T. (2004). ERP evidence of MI activation without motor response execution . Neuroreport 15 , 2067–2070. 10.1097/00001756-200409150-00014 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gerardin E., Sirigu A., Lehéricy S., Poline J. B., Gaymard B., Marsault C., et al.. (2000). Partially overlapping neural networks for real and imagined hand movements . Cereb. Cortex 10 , 1093–1104. 10.1093/cercor/10.11.1093 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gherri E., Eimer M. (2010). Manual response preparation disrupts spatial attention: an electrophysiological investigation of links between action and attention . Neuropsychologia 48 , 961–969. 10.1016/j.neuropsychologia.2009.11.017 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • González-Villar A. J., Bonilla F. M., Carrillo-de-la-Peña M. T. (2016). When the brain simulates stopping: neural activity recorded during real and imagined stop-signal tasks . Cogn. Affect. Behav. Neurosci. [Epub ahead of print]. 10.3758/s13415-016-0434-3 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hohlefeld F. U., Nikulin V. V., Curio G. (2011). Visual stimuli evoke rapid activation (120 ms) of sensorimotor cortex for overt but not for covert movements . Brain Res. 1368 , 185–195. 10.1016/j.brainres.2010.10.035 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huster R. J., Enriquez-Geppert S., Lavallee C. F., Falkenstein M., Herrmann C. S. (2013). Electroencephalography of response inhibition tasks: functional networks and cognitive contributions . Int. J. Psychophysiol. 87 , 217–233. 10.1016/j.ijpsycho.2012.08.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jeannerod M. (2001). Neural simulation of action: a unifying mechanism for motor cognition . Neuroimage 14 , S103–S109. 10.1006/nimg.2001.0832 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones L., Stuth G. (1997). The uses of mental imagery in athletics: an overview . Appl. Prev. Psychol. 6 , 101–115. 10.1016/s0962-1849(05)80016-2 [ CrossRef ] [ Google Scholar ]
  • Jurkiewicz M. T., Gaetz W. C., Bostan A. C., Cheyne D. (2006). Post-movement beta rebound is generated in motor cortex: evidence from neuromagnetic recordings . Neuroimage 32 , 1281–1289. 10.1016/j.neuroimage.2006.06.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kok A. (1986). Effects of degradation of visual stimuli on components of the event-related potential (ERP) in go/nogo reaction tasks . Biol. Psychol. 23 , 21–38. 10.1016/0301-0511(86)90087-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kok A., Ramautar J. R., De Ruiter M. B., Band G. P., Ridderinkhof K. R. (2004). ERP components associated with successful and unsuccessful stopping in a stop-signal task . Psychophysiology 41 , 9–20. 10.1046/j.1469-8986.2003.00127.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kranczioch C., Mathews S., Dean P. J., Sterr A. (2009). On the equivalence of executed and imagined movements: evidence from lateralized motor and nonmotor potentials . Hum. Brain Mapp. 30 , 3275–3286. 10.1002/hbm.20748 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lafleur M. F., Jackson P. L., Malouin F., Richards C. L., Evans A. C., Doyon J. (2002). Motor learning produces parallel dynamic functional changes during the execution and imagination of sequential foot movements . Neuroimage 16 , 142–157. 10.1006/nimg.2001.1048 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Logan G. D., Cowan W. B. (1984). On the ability to inhibit thought and action: a theory of an act of control . Psychol. Rev. 91 , 295–327. 10.1037/0033-295x.91.3.295 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Logan G. D., Schachar R. J., Tannock R. (1997). Impulsivity and inhibitory control . Psychol. Sci. 8 , 60–64. 10.1111/j.1467-9280.1997.tb00545.x [ CrossRef ] [ Google Scholar ]
  • Lotze M., Montoya P., Erb M., Hülsmann E., Flor H., Klose U., et al.. (1999). Activation of cortical and cerebellar motor areas during executed and imagined hand movements: an fMRI study . J. Cogn. Neurosci. 11 , 491–501. 10.1162/089892999563553 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Luck S. J. (2014). An Introduction to the Event-Related Potential Technique. Cambridge, MA: MIT press. [ Google Scholar ]
  • Mak J. N., Wolpaw J. R. (2009). Clinical applications of brain-computer interfaces: current state and future prospects . IEEE Rev. Biomed. Eng. 2 , 187–199. 10.1109/rbme.2009.2035356 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McFarland D. J., Miner L. A., Vaughan T. M., Wolpaw J. R. (2000). Mu and beta rhythm topographies during motor imagery and actual movements . Brain Topogr. 12 , 177–186. 10.1023/A:1023437823106 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mizuguchi N., Sakamoto M., Muraoka T., Kanosue K. (2009). Influence of touching an object on corticospinal excitability during motor imagery . Exp. Brain Res. 196 , 529–535. 10.1007/s00221-009-1875-5 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nam C. S., Jeon Y., Kim Y. J., Lee I., Park K. (2011). Movement imagery-related lateralization of event-related (de) synchronization (ERD/ERS): motor-imagery duration effects . Clin. Neurophysiol. 122 , 567–577. 10.1016/j.clinph.2010.08.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Neuper C., Pfurtscheller G. (1996). Post-movement synchronization of beta rhythms in the EEG over the cortical foot area in man . Neurosci. Lett. 216 , 17–20. 10.1016/0304-3940(96)12991-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nieuwenhuis S., Ridderinkhof K. R., Talsma D., Coles M. G. H., Holroyd C. B., Kok A., et al.. (2002). A computational account of altered error processing in older age: dopamine and the error-related negativity . Cogn. Affect. Behav. Neurosci. 2 , 19–36. 10.3758/cabn.2.1.19 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nieuwenhuis S., Yeung N., van den Wildenberg W., Ridderinkhof K. R. (2003). Electrophysiological correlates of anterior cingulate function in a go/no-go task: effects of response conflict and trial type frequency . Cogn. Affect. Behav. Neurosci. 3 , 17–26. 10.3758/cabn.3.1.17 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Oostenveld R., Praamstra P., Stegeman D. F., van Oosterom A. (2001). Overlap of attention and movement-related activity in lateralized event-related brain potentials . Clin. Neurophysiol. 112 , 477–484. 10.1016/s1388-2457(01)00460-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Penfield W., Boldrey E. (1937). Somatic motor and sensory representation in the cerebral cortex of man as studied by electrical stimulation . Brain 60 , 389–443. 10.1093/brain/60.4.389 [ CrossRef ] [ Google Scholar ]
  • Penfield W., Rasmussen T. (1950). The Cerebral Cortex of Man; a Clinical Study of Localization of Function. Oxford: Macmillan. [ Google Scholar ]
  • Pfurtscheller G., Brunner C., Schlögl A., Lopes da Silva F. H. (2006). Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks . Neuroimage 31 , 153–159. 10.1016/j.neuroimage.2005.12.003 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pfurtscheller G., Neuper C. (1997). Motor imagery activates primary sensorimotor area in humans . Neurosci. Lett. 239 , 65–68. 10.1016/s0304-3940(97)00889-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pfurtscheller G., Neuper C., Brunner C., Lopes da Silva F. (2005). Beta rebound after different types of motor imagery in man . Neurosci. Lett. 378 , 156–159. 10.1016/j.neulet.2004.12.034 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Praamstra P. (2006). Prior information of stimulus location: effects on ERP measures of visual selection and response selection . Brain Res. 1072 , 153–160. 10.1016/j.brainres.2005.11.098 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Praamstra P. (2007). Do’s and don’ts with lateralized event-related brain potentials . J. Exp. Psychol. Hum. Percept. Perform. 33 , 497–502. 10.1037/0096-1523.33.2.497 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Praamstra P., Boutsen L., Humphreys G. W. (2005). Frontoparietal control of spatial attention and motor intention in human EEG . J. Neurophysiol. 94 , 764–774. 10.1152/jn.01052.2004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Praamstra P., Kourtis D. (2010). An early parietal component of the frontoparietal system: EDAN ≢q N2pc . Brain Res. 1317 , 203–210. 10.1016/j.brainres.2009.12.090 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Praamstra P., Oostenveld R. (2003). Attention and movement-related motor cortex activation: a high-density EEG study of spatial stimulus-response compatibility . Brain Res. Cogn. Brain Res. 16 , 309–322. 10.1016/s0926-6410(02)00286-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Praamstra P., Schmitz F., Freund H. J., Schnitzler A. (1999). Magneto-encephalographic correlates of the lateralized readiness potential . Brain Res. Cogn. Brain Res. 8 , 77–85. 10.1016/s0926-6410(99)00008-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Praamstra P., Seiss E. (2005). The neurophysiology of response competition: motor cortex activation and inhibition following subliminal response priming . J. Cogn. Neurosci. 17 , 483–493. 10.1162/0898929053279513 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Porro C. A., Francescato M. P., Cettolo V., Diamond M. E., Baraldi P., Zuiani C., et al.. (1996). Primary motor and sensory cortex activation during motor performance and motor imagery: a functional magnetic resonance imaging study . J. Neurosci. 16 , 7688–7698. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ramautar J. R., Kok A., Ridderinkhof K. R. (2006). Effects of stop-signal modality on the N2/P3 complex elicited in the stop-signal paradigm . Biol. Psychol. 72 , 96–109. 10.1016/j.biopsycho.2005.08.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ridderinkhof K. R., Brass M. (2015). How Kinesthetic Motor Imagery works: a predictive-processing theory of visualization in sports and motor expertise . J. Physiol. Paris 109 , 53–63. 10.1016/j.jphysparis.2015.02.003 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roland P. E., Larsen B., Lassen N. A., Skinhøj E. (1980). Supplementary motor area and other cortical areas in organization of voluntary movements in man . J. Neurophysiol. 43 , 118–136. [ PubMed ] [ Google Scholar ]
  • Roosink M., Zijdewind I. (2010). Corticospinal excitability during observation and imagery of simple and complex hand tasks: implications for motor rehabilitation . Behav. Brain Res. 213 , 35–41. 10.1016/j.bbr.2010.04.027 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roth M., Decety J., Raybaudi M., Massarelli R., Delon-Martin C., Segebarth C., et al.. (1996). Possible involvement of primary motor cortex in mentally simulated movement: a functional magnetic resonance imaging study . Neuroreport 7 , 1280–1284. 10.1097/00001756-199605170-00012 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Salisbury D. F., Griggs C. B., Shenton M. E., McCarley R. W. (2004). The NoGo P300 ‘anteriorization’effect and response inhibition . Clin. Neurophysiol. 115 , 1550–1558. 10.1016/j.clinph.2004.01.028 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schwarzenau P., Falkenstein M., Hoormann J., Hohnsbein J. (1998). A new method for the estimation of the onset of the lateralized readiness potential (LRP) . Behav. Res. Methods Instrum. Comput. 30 , 110–117. 10.3758/bf03209421 [ CrossRef ] [ Google Scholar ]
  • Smith J. L., Jamadar S., Provost A. L., Michie P. T. (2013). Motor and non-motor inhibition in the Go/NoGo task: an ERP and fMRI study . Int. J. Psychophysiol. 87 , 244–253. 10.1016/j.ijpsycho.2012.07.185 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Solis-Escalante T., Müller-Putz G. R., Pfurtscheller G., Neuper C. (2012). Cue-induced beta rebound during withholding of overt and covert foot movement . Clin. Neurophysiol. 123 , 1182–1190. 10.1016/j.clinph.2012.01.013 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stavrinou M. L., Moraru L., Cimponeriu L., Della Penna S., Bezerianos A. (2007). Evaluation of cortical connectivity during real and imagined rhythmic finger tapping . Brain Topogr. 19 , 137–145. 10.1007/s10548-007-0020-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stippich C., Ochmann H., Sartor K. (2002). Somatotopic mapping of the human primary sensorimotor cortex during motor imagery and motor execution by functional magnetic resonance imaging . Neurosci. Lett. 331 , 50–54. 10.1016/s0304-3940(02)00826-1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Swick D., Ashley V., Turken U. (2011). Are the neural correlates of stopping and not going identical? Quantitative meta-analysis of two response inhibition tasks . Neuroimage 56 , 1655–1665. 10.1016/j.neuroimage.2011.02.070 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ulrich R., Miller J. (2001). Using the jackknife-based scoring method for measuring LRP onset effects in factorial designs . Psychophysiology 38 , 816–827. 10.1111/1469-8986.3850816 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Verleger R., Paehge T., Kolev V., Yordanova J., Jaśkowski P. (2006). On the relation of movement-related potentials to the go/no-go effect on P3 . Biol. Psychol. 73 , 298–313. 10.1016/j.biopsycho.2006.05.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Verleger R., Vollmer C., Wauschkuhn B., van der Lubbe R. H. J., Wascher E. (2000). Dimensional overlap between arrows as cueing stimuli and responses? Evidence from contra-ipsilateral differences in EEG potentials . Brain Res. Cogn. Brain Res. 10 , 99–109. 10.1016/s0926-6410(00)00032-x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wessel J. R., Aron A. R. (2015). It’s not too late: the onset of the frontocentral P3 indexes successful response inhibition in the stop-signal paradigm . Psychophysiology 52 , 472–480. 10.1111/psyp.12374 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Williams J., Pearce A. J., Loporto M., Morris T., Holmes P. S. (2012). The relationship between corticospinal excitability during motor imagery and motor imagery ability . Behav. Brain Res. 226 , 369–375. 10.1016/j.bbr.2011.09.014 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yeung N., Botvinick M. M., Cohen J. D. (2004). The neural basis of error-detection: conflict monitoring and the error-related negativity . Psychol. Rev. 111 , 931–959. 10.1037/0033-295X.111.4.939 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zimmermann-Schlatter A., Schuster C., Puhan M. A., Siekierka E., Steurer J. (2008). Efficacy of motor imagery in post-stroke rehabilitation: a systematic review . J. Neuroeng. Rehabil. 5 :8. 10.1186/1743-0003-5-8 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

IMAGES

  1. PPT

    equivalence hypothesis in psychology

  2. Equivalence Testing for Psychological Research: A Tutorial

    equivalence hypothesis in psychology

  3. Types of Hypotheses in Psychology Explained #Alevel #Revision

    equivalence hypothesis in psychology

  4. Difference between a Theory and a Hypothesis (Intro Psych Tutorial #16)

    equivalence hypothesis in psychology

  5. 4.3 Using Theories in Psychological Research

    equivalence hypothesis in psychology

  6. What is a Hypothesis

    equivalence hypothesis in psychology

VIDEO

  1. Equivalence test Vs ANOVA

  2. Hypothesis Testing

  3. Difference between Hypothesis and Theory

  4. Testing of Hypothesis for Categorical Data

  5. Deprivation hypothesis explains why more is less

  6. Hypothesis in Research

COMMENTS

  1. Equivalence Testing for Psychological Research: A Tutorial

    Psychologists must be able to test both for the presence of an effect and for the absence of an effect. In addition to testing against zero, researchers can use the two one-sided tests (TOST) procedure to test for equivalence and reject the presence of a smallest effect size of interest (SESOI). The TOST procedure can be used to determine if an observed effect is surprisingly small, given that ...

  2. Equivalence Tests: A Practical Primer for t Tests, Correlations, and

    It is statistically impossible to support the hypothesis that a true effect size is exactly zero. What is possible in a frequentist hypothesis testing framework is to statistically reject effects large enough to be deemed worthwhile. When researchers want to argue for the absence of an effect that is large enough to be worthwhile to examine, they can test for equivalence (Wellek, 2010).

  3. Equivalence Testing for Psychological Research: A Tutorial

    Equivalence testing can be used to test whether an observed effect is surprisingly small, assuming that a meaningful effect exists in the population (see, e.g., Goertzen & Cribbie, 2010; Meyners, 2012; Quertemont, 2011; Rogers, Howard, & Vessey, 1993). The test is a simple variation of widely used null-hypothesis signifi-cance tests.

  4. Equivalence Tests

    There have been previous attempts to introduce equivalence testing to psychology (Quertemont, 2011; Rogers, Howard, & Vessey, ... This is comparable to the large sample sizes that are required to reject a true but small effect when the null hypothesis is a null effect. Equivalence tests require slightly larger sample sizes than traditional null ...

  5. PDF Equivalence Testing for Psychological Research: A Tutorial

    result would falsify this hypothesis. Equivalence testing can be used to test whether an observed effect is surprisingly small, assuming a meaningful effect exists in the population. The test is a simple variation of widely used null hypothesis significance tests (NHST). To understand the idea behind equivalence tests, it is useful

  6. Equivalence Testing for Psychological Research: A Tutorial

    ered surprising, given that most effects in psychology . ... To evaluate effects where we cannot reject the null hypothesis, we tested for equivalence 67 against an interval of (−0.001, 0.001 ...

  7. Equivalence Tests: A Practical Primer for t Tests, Correlations, and

    This practical primer with accompanying spreadsheet and R package enables psychologists to easily perform equivalence tests (and power analyses) by setting equivalence bounds based on standardized effect sizes and provides recommendations to prespecify equivalence bounds. Extending your statistical tool kit with equivalence tests is an easy way ...

  8. Equivalence Testing for Psychological Research: A Tutorial

    The equivalence test (c) tests if the null hypothesis that an effect is at least as small as Δ L or at least as large as Δ U can be rejected. The inferiority test (d) tests if the null ...

  9. Equivalence testing for linear regression.

    The null hypothesis for an equivalence test can, therefore, be defined as . In other words, ... In psychology research and in the social sciences more broadly, the practice of equivalence testing is relatively new but is "rapidly expanding" (Koh & Cribbie, 2013).

  10. B-Value and Empirical Equivalence Bound: A New Procedure of Hypothesis

    where S 1 2 and S 2 2 are the sample variance of the two groups, and n 1 and n 2 are the sample sizes. Hypothesis test is based on whether [L 0, U 0] covers zero.However, when 0 ∈ [L 0, U 0], via normal testing logic, one cannot directly conclude equivalence of the two groups.. As suggested in Seaman and Serlin 11, in order to evaluate equivalence, an equivalence test can be conducted.

  11. Recommendations for applying tests of equivalence

    Researchers in psychology reliably select traditional null hypothesis significance tests (e.g., Student's t test), regardless of whether the research hypothesis relates to whether the group means are equivalent or whether the group means are different. Tests of equivalence, which have been popular i …

  12. PDF Theory and intervention equivalence and bias: New constructs ...

    STEP in the practice of psychology. Types of Equivalence and Bias Equivalence As others have claimed (van de Vijver, 2001; Ægisdóttir et al., 2008), equivalence is a significant concept ... tionand adaptation of the Contact Hypothesis (Allport, 1954) later known as the Intergroup Contact Theory (Pettigrew, 1998). This hypothesis stipulates ...

  13. Testing of Hypothesis in Equivalence and Non Inferiority Trials-A

    The null hypothesis in this situation becomes, the performance of drug is less than -d and is looked for rejection through a single tailed to achieve alternate which is the performance of more than -d test unlike in equivalence trial where performance of more than +d ie superiority was also part of null hypothesis and alternate was ...

  14. Social Psychological and Equivalence Tests: A Practical Primer for

    research methods, equivalence testing, null hypothesis significance testing, power analysis Scientists should be able to provide support for the null hypothesis. A limitation of the widespread use of traditional significance tests, where the null hypothesis is that the true effect size is zero, is that the absence of an effect can be

  15. Effect sizes for equivalence testing: Incorporating the equivalence

    In recent years, it has been gaining traction in the field of psychology. For example, equivalence tests have been used to test for gender similarities in intelligence ... To address such research questions (e.g., mean group similarity), researchers often opt to use traditional null hypothesis significance testing (NHST) methods ...

  16. British Journal of Mathematical and Statistical Psychology

    Equivalence tests are an alternative to traditional difference-based tests for demonstrating a lack of association between two variables. While there are several recent studies investigating equivalence tests for comparing means, little research has been conducted on equivalence methods for evaluating the equivalence or similarity of two correlation coefficients or two regression coefficients.

  17. Equivalence Testing for Psychological Research: A Tutorial

    In addition to testing against zero, researchers can use the two one-sided tests (TOST) procedure to test for equivalence and reject the presence of a smallest effect size of interest (SESOI). The TOST procedure can be used to determine if an observed effect is surprisingly small, given that a true effect at least as extreme as the SESOI exists.

  18. Equivalent statistics for a one-sample t -test

    Equivalence of statistics. For a one-sample t -test, we have a null hypothesis, given as H0 : μ = μ0, where μ is a population mean and μ0 is a specific value, and an alternative hypothesis, denoted as HA : μ ≠ μ0. (One can also consider directional alternative hypotheses.)

  19. Equivalence Testing With TOSTER

    Any science that wants to be taken seriously needs to be able to provide support for the null hypothesis. I often see people switching over from frequentist statistics when effects are significant to the use of Bayes factors to be able to provide support for the null hypothesis. ... I've created my first R package, TOSTER (as in Two One-Sided ...

  20. Noninferiority and Equivalence Designs: Issues and Implications for

    Equivalence designs, a two-sided test, pose a similar question, but also allow for the possibility that the novel intervention is no better than the standard one. ... Both confidence interval and hypothesis testing approaches will be used in the final analysis. ... Journal of Consulting and Clinical Psychology. 1992; 60:748-756. [Google Scholar]

  21. Functional equivalence or behavioural matching? A critical reflection

    Motor imagery, or the mental rehearsal of actions in the absence of physical movement, is an increasingly popular construct in fields such as neuroscience, cognitive psychology and sport psychology. Unfortunately, few models of motor imagery have been postulated to date. Nevertheless, based on the hypothesis of functional equivalence between imagery, perception and motor execution, Holmes and ...

  22. The Happy-Productive Worker Hypothesis: Factor or Fallacy?

    The "happy-productive worker" hypothesis is a fallacy which suggests that happy people are more productive. Research found contradictory evidence that challenges the notion of a direct, reciprocal ...

  23. Research Challenge Grant winners announced

    Eight research projects involving 16 faculty members are the winners of the inaugural Dean's Research Challenge Grants. Proposals submitted this year were required to focus on one of two themes—"Equity" or "Environment." Recipients will deliver presentations about their research projects at a College of Arts and Sciences Research Symposium on Friday, October 11.

  24. Functional Equivalence of Imagined vs. Real Performance of an

    Although the recording of brain activity during the covert performance of an inhibitory task could provide additional support for the functional equivalence hypothesis, as far as we know, there is only one study comparing actual and imagined performance of a stop-signal task (González-Villar et al., 2016). Using auditory stimuli as stop ...