The JASP guidelines for conducting and reporting a Bayesian analysis

  • Theoretical Review
  • Open access
  • Published: 09 October 2020
  • Volume 28 , pages 813–826, ( 2021 )

Cite this article

You have full access to this open access article

research paper on bayesian statistics

  • Johnny van Doorn 1 ,
  • Don van den Bergh 1 ,
  • Udo Böhm 1 ,
  • Fabian Dablander 1 ,
  • Koen Derks 2 ,
  • Tim Draws 1 ,
  • Alexander Etz 3 ,
  • Nathan J. Evans 1 ,
  • Quentin F. Gronau 1 ,
  • Julia M. Haaf 1 ,
  • Max Hinne 1 ,
  • Šimon Kucharský 1 ,
  • Alexander Ly 1 , 4 ,
  • Maarten Marsman 1 ,
  • Dora Matzke 1 ,
  • Akash R. Komarlu Narendra Gupta 1 ,
  • Alexandra Sarafoglou 1 ,
  • Angelika Stefan 1 ,
  • Jan G. Voelkel 5 &
  • Eric-Jan Wagenmakers 1  

41k Accesses

476 Citations

43 Altmetric

Explore all metrics

Despite the increasing popularity of Bayesian inference in empirical research, few practical guidelines provide detailed recommendations for how to apply Bayesian procedures and interpret the results. Here we offer specific guidelines for four different stages of Bayesian statistical reasoning in a research setting: planning the analysis, executing the analysis, interpreting the results, and reporting the results. The guidelines for each stage are illustrated with a running example. Although the guidelines are geared towards analyses performed with the open-source statistical software JASP, most guidelines extend to Bayesian inference in general.

Similar content being viewed by others

research paper on bayesian statistics

Bayesian Analysis Reporting Guidelines

Four reasons to prefer bayesian analyses over significance testing, how to become a bayesian in eight easy steps: an annotated reading list.

Avoid common mistakes on your manuscript.

In recent years, Bayesian inference has become increasingly popular, both in statistical science and in applied fields such as psychology, biology, and econometrics (e.g., Andrews & Baguley, 2013 ; Vandekerckhove, Rouder, & Kruschke, 2018 ). For the pragmatic researcher, the adoption of the Bayesian framework brings several advantages over the standard framework of frequentist null-hypothesis significance testing (NHST), including (1) the ability to obtain evidence in favor of the null hypothesis and discriminate between “absence of evidence” and “evidence of absence” (Dienes, 2014 ; Keysers, Gazzola, & Wagenmakers, 2020 ); (2) the ability to take into account prior knowledge to construct a more informative test (Gronau, Ly, & Wagenmakers, 2020 ; Lee & Vanpaemel, 2018 ); and (3) the ability to monitor the evidence as the data accumulate (Rouder, 2014 ). However, the relative novelty of conducting Bayesian analyses in applied fields means that there are no detailed reporting standards, and this in turn may frustrate the broader adoption and proper interpretation of the Bayesian framework.

Several recent statistical guidelines include information on Bayesian inference, but these guidelines are either minimalist (Appelbaum et al., 2018 ; The BaSiS group, 2001 ), focus only on relatively complex statistical tests (Depaoli & Schoot, 2017 ), are too specific to a certain field (Spiegelhalter, Myles, Jones, & Abrams, 2000 ; Sung et al., 2005 ), or do not cover the full inferential process (Jarosz & Wiley, 2014 ). The current article aims to provide a general overview of the different stages of the Bayesian reasoning process in a research setting. Specifically, we focus on guidelines for analyses conducted in JASP (JASP Team, 2019 ; jasp-stats.org ), although these guidelines can be generalized to other software packages for Bayesian inference. JASP is an open-source statistical software program with a graphical user interface that features both Bayesian and frequentist versions of common tools such as the t test, the ANOVA, and regression analysis (e.g., Marsman & Wagenmakers, 2017 ; Wagenmakers et al., 2018 ).

We discuss four stages of analysis: planning, executing, interpreting, and reporting. These stages and their individual components are summarized in Table  1 . In order to provide a concrete illustration of the guidelines for each of the four stages, each section features a data set reported by Frisby and Clatworthy ( 1975 ). This data set concerns the time it took two groups of participants to see a figure hidden in a stereogram—one group received advance visual information about the scene (i.e., the VV condition), whereas the other group did not (i.e., the NV condition). Footnote 1 Three additional examples (mixed ANOVA, correlation analysis, and a t test with an informed prior) are provided in an online appendix at https://osf.io/nw49j/ . Throughout the paper, we present three boxes that provide additional technical discussion. These boxes, while not strictly necessary, may prove useful to readers interested in greater detail.

Stage 1: Planning the analysis

Specifying the goal of the analysis..

We recommend that researchers carefully consider their goal, that is, the research question that they wish to answer, prior to the study (Jeffreys, 1939 ). When the goal is to ascertain the presence or absence of an effect, we recommend a Bayes factor hypothesis test (see Box 1). The Bayes factor compares the predictive performance of two hypotheses. This underscores an important point: in the Bayes factor testing framework, hypotheses cannot be evaluated until they are embedded in fully specified models with a prior distribution and likelihood (i.e., in such a way that they make quantitative predictions about the data). Thus, when we refer to the predictive performance of a hypothesis, we implicitly refer to the accuracy of the predictions made by the model that encompasses the hypothesis (Etz, Haaf, Rouder, & Vandekerckhove, 2018 ).

When the goal is to determine the size of the effect, under the assumption that it is present, we recommend to plot the posterior distribution or summarize it by a credible interval (see Box 2). Testing and estimation are not mutually exclusive and may be used in sequence; for instance, one may first use a test to ascertain that the effect exists, and then continue to estimate the size of the effect.

Box 1. Hypothesis testing

The principled approach to Bayesian hypothesis testing is by means of the Bayes factor (e.g., Etz & Wagenmakers, 2017 ; Jeffreys, 1939 ; Ly, Verhagen, & Wagenmakers, 2016 ; Wrinch & Jeffreys, 1921 ). The Bayes factor quantifies the relative predictive performance of two rival hypotheses, and it is the degree to which the data demand a change in beliefs concerning the hypotheses’ relative plausibility (see Equation  1 ). Specifically, the first term in Equation  1 corresponds to the prior odds, that is, the relative plausibility of the rival hypotheses before seeing the data. The second term, the Bayes factor, indicates the evidence provided by the data. The third term, the posterior odds, indicates the relative plausibility of the rival hypotheses after having seen the data.

The subscript in the Bayes factor notation indicates which hypothesis is supported by the data. BF 10 indicates the Bayes factor in favor of \({\mathscr{H}}_{1}\) over \({\mathscr{H}}_{0}\) , whereas BF 01 indicates the Bayes factor in favor of \({\mathscr{H}}_{0}\) over \({\mathscr{H}}_{1}\) . Specifically, BF 10 = 1/BF 01 . Larger values of BF 10 indicate more support for \({\mathscr{H}}_{1}\) . Bayes factors range from 0 to \(\infty \) , and a Bayes factor of 1 indicates that both hypotheses predicted the data equally well. This principle is further illustrated in Figure  4 .

Box 2. Parameter estimation

For Bayesian parameter estimation, interest centers on the posterior distribution of the model parameters. The posterior distribution reflects the relative plausibility of the parameter values after prior knowledge has been updated by means of the data. Specifically, we start the estimation procedure by assigning the model parameters a prior distribution that reflects the relative plausibility of each parameter value before seeing the data. The information in the data is then used to update the prior distribution to the posterior distribution. Parameter values that predicted the data relatively well receive a boost in plausibility, whereas parameter values that predicted the data relatively poorly suffer a decline (Wagenmakers, Morey, & Lee, 2016 ). Equation  2 illustrates this principle. The first term indicates the prior beliefs about the values of parameter 𝜃 . The second term is the updating factor: for each value of 𝜃 , the quality of its prediction is compared to the average quality of the predictions over all values of 𝜃 . The third term indicates the posterior beliefs about 𝜃 .

The posterior distribution can be plotted or summarized by an x % credible interval. An x % credible interval contains x % of the posterior mass. Two popular ways of creating a credible interval are the highest density credible interval, which is the narrowest interval containing the specified mass, and the central credible interval, which is created by cutting off \(\frac {100-x}{2}\%\) from each of the tails of the posterior distribution.

Specifying the statistical model.

The functional form of the model (i.e., the likelihood; Etz, 2018 ) is guided by the nature of the data and the research question. For instance, if interest centers on the association between two variables, one may specify a bivariate normal model in order to conduct inference on Pearson’s correlation parameter ρ . The statistical model also determines which assumptions ought to be satisfied by the data. For instance, the statistical model might assume the dependent variable to be normally distributed. Violations of assumptions may be addressed at different points in the analysis, such as the data preprocessing steps discussed below, or by planning to conduct robust inferential procedures as a contingency plan.

The next step in model specification is to determine the sidedness of the procedure. For hypothesis testing, this means deciding whether the procedure is one-sided (i.e., the alternative hypothesis dictates a specific direction of the population effect) or two-sided (i.e., the alternative hypothesis dictates that the effect can be either positive or negative). The choice of one-sided versus two-sided depends on the research question at hand and this choice should be theoretically justified prior to the study. For hypothesis testing it is usually the case that the alternative hypothesis posits a specific direction. In Bayesian hypothesis testing, a one-sided hypothesis yields a more diagnostic test than a two-sided alternative (e.g., Jeffreys, 1961 ; Wetzels, Raaijmakers, Jakab, & Wagenmakers, 2009 , p.283). Footnote 2

For parameter estimation, we recommend to always use the two-sided model instead of the one-sided model: when a positive one-sided model is specified but the observed effect turns out to be negative, all of the posterior mass will nevertheless remain on the positive values, falsely suggesting the presence of a small positive effect.

The next step in model specification concerns the type and spread of the prior distribution, including its justification. For the most common statistical models (e.g., correlations, t tests, and ANOVA), certain “default” prior distributions are available that can be used in cases where prior knowledge is absent, vague, or difficult to elicit (for more information, see Ly et al.,, 2016 ). These priors are default options in JASP. In cases where prior information is present, different “informed” prior distributions may be specified. However, the more the informed priors deviate from the default priors, the stronger becomes the need for a justification (see the informed t test example in the online appendix at https://osf.io/ybszx/ ). Additionally, the robustness of the result to different prior distributions can be explored and included in the report. This is an important type of robustness check because the choice of prior can sometimes impact our inferences, such as in experiments with small sample sizes or missing data. In JASP, Bayes factor robustness plots show the Bayes factor for a wide range of prior distributions, allowing researchers to quickly examine the extent to which their conclusions depend on their prior specification. An example of such a plot is given later in Figure  7 .

Specifying data preprocessing steps.

Dependent on the goal of the analysis and the statistical model, different data preprocessing steps might be taken. For instance, if the statistical model assumes normally distributed data, a transformation to normality (e.g., the logarithmic transformation) might be considered (e.g., Draper & Cox, 1969 ). Other points to consider at this stage are when and how outliers may be identified and accounted for, which variables are to be analyzed, and whether further transformation or combination of data are necessary. These decisions can be somewhat arbitrary, and yet may exert a large influence on the results (Wicherts et al., 2016 ). In order to assess the degree to which the conclusions are robust to arbitrary modeling decisions, it is advisable to conduct a multiverse analysis (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016 ). Preferably, the multiverse analysis is specified at study onset. A multiverse analysis can easily be conducted in JASP, but doing so is not the goal of the current paper.

Specifying the sampling plan.

As may be expected from a framework for the continual updating of knowledge, Bayesian inference allows researchers to monitor evidence as the data come in, and stop whenever they like, for any reason whatsoever. Thus, strictly speaking there is no Bayesian need to pre-specify sample size at all (e.g., Berger & Wolpert, 1988 ). Nevertheless, Bayesians are free to specify a sampling plan if they so desire; for instance, one may commit to stop data collection as soon as BF 10 ≥ 10 or BF 01 ≥ 10. This approach can also be combined with a maximum sample size ( N ), where data collection stops when either the maximum N or the desired Bayes factor is obtained, whichever comes first (for examples see ; Matzke et al., 2015 ;Wagenmakers et al., 2015 ).

In order to examine what sampling plans are feasible, researchers can conduct a Bayes factor design analysis (Schönbrodt & Wagenmakers, 2018 ; Stefan, Gronau, Schönbrodt, & Wagenmakers, 2019 ), a method that shows the predicted outcomes for different designs and sampling plans. Of course, when the study is observational and the data are available ‘en bloc’, the sampling plan becomes irrelevant in the planning stage.

Stereogram example

First, we consider the research goal, which was to determine if participants who receive advance visual information exhibit a shorter fuse time (Frisby & Clatworthy, 1975 ). A Bayes factor hypothesis test can be used to quantify the evidence that the data provide for and against the hypothesis that an effect is present. Should this test reveal support in favor of the presence of the effect, then we have grounds for a follow-up analysis in which the size of the effect is estimated.

Second, we specify the statistical model. The study focus is on the difference in performance between two between-subjects conditions, suggesting a two-sample t test on the fuse times is appropriate. The main measure of the study is a reaction time variable, which can for various reasons be non-normally distributed (Lo & Andrews, 2015 ; but see Schramm & Rouder, 2019 ). If our data show signs of non-normality we will conduct two alternatives: a t test on the log-transformed fuse time data and a non-parametric t test (i.e., the Mann–Whitney U test), which is robust to non-normality and unaffected by the log-transformation of the fuse times.

For hypothesis testing, we compare the null hypothesis (i.e., advance visual information has no effect on fuse times) to a one-sided alternative hypothesis (i.e., advance visual information shortens the fuse times), in line with the directional nature of the original research question. The rival hypotheses are thus \({\mathscr{H}}_{0}: \delta = 0\) and \({\mathscr{H}}_{+}: \delta > 0\) , where δ is the standardized effect size (i.e., the population version of Cohen’s d ), \({\mathscr{H}}_{0}\) denotes the null hypothesis, and \({\mathscr{H}}_{+}\) denotes the one-sided alternative hypothesis (note the ‘+’ in the subscript). For parameter estimation (under the assumption that the effect exists), we use the two-sided t test model and plot the posterior distribution of δ . This distribution can also be summarized by a 95 % central credible interval.

We complete the model specification by assigning prior distributions to the model parameters. Since we have only little prior knowledge about the topic, we select a default prior option for the two-sample t test, that is, a Cauchy distribution Footnote 3 with spread r set to \({1}/{\sqrt {2}}\) . Since we specified a one-sided alternative hypothesis, the prior distribution is truncated at zero, such that only positive effect size values are allowed. The robustness of the Bayes factor to this prior specification can be easily assessed in JASP by means of a Bayes factor robustness plot.

Since the data are already available, we do not have to specify a sampling plan. The original data set has a total sample size of 103, from which 25 participants were eliminated due to failing an initial stereo-acuity test, leaving 78 participants (43 in the NV condition and 35 in the VV condition). The data are available online at https://osf.io/5vjyt/ .

Stage 2: Executing the analysis

Before executing the primary analysis and interpreting the outcome, it is important to confirm that the intended analyses are appropriate and the models are not grossly misspecified for the data at hand. In other words, it is strongly recommended to examine the validity of the model assumptions (e.g., normally distributed residuals or equal variances across groups). Such assumptions may be checked by plotting the data, inspecting summary statistics, or conducting formal assumption tests (but see Tijmstra, 2018 ).

A powerful demonstration of the dangers of failing to check the assumptions is provided by Anscombe’s quartet (Anscombe, 1973 ; see Fig.  1 ). The quartet consists of four fictitious data sets of equal size that each have the same observed Pearson’s product moment correlation r , and therefore lead to the same inferential result both in a frequentist and a Bayesian framework. However, visual inspection of the scatterplots immediately reveals that three of the four data sets are not suitable for a linear correlation analysis, and the statistical inference for these three data sets is meaningless or even misleading. This example highlights the adage that conducting a Bayesian analysis does not safeguard against general statistical malpractice—the Bayesian framework is as vulnerable to violations of assumptions as its frequentist counterpart. In cases where assumptions are violated, an ordinal or non-parametric test can be used, and the parametric results should be interpreted with caution.

figure 1

Model misspecification is also a problem for Bayesian analyses. The four scatterplots in the top panel show Anscombe’s quartet (Anscombe, 1973 ); the bottom panel shows the corresponding inference, which is identical for all four scatter plots. Except for the leftmost scatterplot, all data violate the assumptions of the linear correlation analysis in important ways

Once the quality of the data has been confirmed, the planned analyses can be carried out. JASP offers a graphical user interface for both frequentist and Bayesian analyses. JASP 0.10.2 features the following Bayesian analyses: the binomial test, the Chi-square test, the multinomial test, the t test (one-sample, paired sample, two-sample, Wilcoxon rank-sum, and Wilcoxon signed-rank tests), A/B tests, ANOVA, ANCOVA, repeated measures ANOVA, correlations (Pearson’s ρ and Kendall’s τ ), linear regression, and log-linear regression. After loading the data into JASP, the desired analysis can be conducted by dragging and dropping variables into the appropriate boxes; tick marks can be used to select the desired output.

The resulting output (i.e., figures and tables) can be annotated and saved as a .jasp file. Output can then be shared with peers, with or without the real data in the .jasp file; if the real data are added, reviewers can easily reproduce the analyses, conduct alternative analyses, or insert comments.

In order to check for violations of the assumptions of the t test, the top row of Fig.  2 shows boxplots and Q-Q plots of the dependent variable fuse time, split by condition. Visual inspection of the boxplots suggests that the variances of the fuse times may not be equal (observed standard deviations of the NV and VV groups are 8.085 and 4.802, respectively), suggesting the equal variance assumption may be unlikely to hold. There also appear to be a number of potential outliers in both groups. Moreover, the Q-Q plots show that the normality assumption of the t test is untenable here. Thus, in line with our analysis plan we will apply the log-transformation to the fuse times. The standard deviations of the log-transformed fuse times in the groups are roughly equal (observed standard deviations are 0.814 and 0.818 in the NV and the VV group, respectively); the Q-Q plots in the bottom row of Fig.  2 also look acceptable for both groups and there are no apparent outliers. However, it seems prudent to assess the robustness of the result by also conducting the Bayesian Mann–Whitney U test (van Doorn, Ly, Marsman, & Wagenmakers, 2020 ) on the fuse times.

figure 2

Descriptive plots allow a visual assessment of the assumptions of the t test for the stereogram data. The top row shows descriptive plots for the raw fuse times, and the bottom row shows descriptive plots for the log-transformed fuse times. The left column shows boxplots, including the jittered data points, for each of the experimental conditions. The middle and right columns show parQ-Q plots of the dependent variable, split by experimental condition. Here we see that the log-transformed dependent variable is more appropriate for the t test, due to its distribution and absence of outliers. Figures from JASP

Following the assumption check, we proceed to execute the analyses in JASP. For hypothesis testing, we obtain a Bayes factor using the one-sided Bayesian two-sample t test. Figure  3 shows the JASP user interface for this procedure. For parameter estimation, we obtain a posterior distribution and credible interval, using the two-sided Bayesian two-sample t test. The relevant boxes for the various plots were ticked, and an annotated .jasp file was created with all of the relevant analyses: the one-sided Bayes factor hypothesis tests, the robustness check, the posterior distribution from the two-sided analysis, and the one-sided results of the Bayesian Mann–Whitney U test. The .jasp file can be found at https://osf.io/nw49j/ . The next section outlines how these results are to be interpreted.

figure 3

JASP menu for the Bayesian two-sample t test. The left input panel offers the analysis options, including the specification of the alternative hypothesis and the selection of plots. The right output panel shows the corresponding analysis output. The prior and posterior plot is explained in more detail in Fig.  6 . The input panel specifies the one-sided analysis for hypothesis testing; a two-sided analysis for estimation can be obtained by selecting “Group 1 ≠ Group 2” under “Alt. Hypothesis”

Stage 3: Interpreting the results

With the analysis outcome in hand, we are ready to draw conclusions. We first discuss the scenario of hypothesis testing, where the goal typically is to conclude whether an effect is present or absent. Then, we discuss the scenario of parameter estimation, where the goal is to estimate the size of the population effect, assuming it is present. When both hypothesis testing and estimation procedures have been planned and executed, there is no predetermined order for their interpretation. One may adhere to the adage “only estimate something when there is something to be estimated” (Wagenmakers et al., 2018 ) and first test whether an effect is present, and then estimate its size (assuming the test provided sufficiently strong evidence against the null), or one may first estimate the magnitude of an effect, and then quantify the degree to which this magnitude warrants a shift in plausibility away from or toward the null hypothesis (but see Box 3).

If the goal of the analysis is hypothesis testing, we recommend using the Bayes factor. As described in Box 1, the Bayes factor quantifies the relative predictive performance of two rival hypotheses (Wagenmakers et al., 2016 ; see Box 1). Importantly, the Bayes factor is a relative metric of the hypotheses’ predictive quality. For instance, if BF 10 = 5, this means that the data are 5 times more likely under \({\mathscr{H}}_{1}\) than under \({\mathscr{H}}_{0}\) . However, a Bayes factor in favor of \({\mathscr{H}}_{1}\) does not mean that \({\mathscr{H}}_{1}\) predicts the data well. As Figure  1 illustrates, \({\mathscr{H}}_{1}\) provides a dreadful account of three out of four data sets, yet is still supported relative to \({\mathscr{H}}_{0}\) .

There can be no hard Bayes factor bound (other than zero and infinity) for accepting or rejecting a hypothesis wholesale, but there have been some attempts to classify the strength of evidence that different Bayes factors provide (e.g., Jeffreys, 1939 ; Kass & Raftery, 1995 ). One such classification scheme is shown in Figure  4 . Several magnitudes of the Bayes factor are visualized as a probability wheel, where the proportion of red to white is determined by the degree of evidence in favor of \({\mathscr{H}}_{0}\) and \({\mathscr{H}}_{1}\) . Footnote 4 In line with Jeffreys, a Bayes factor between 1 and 3 is considered weak evidence, a Bayes factor between 3 and 10 is considered moderate evidence, and a Bayes factor greater than 10 is considered strong evidence. Note that these classifications should only be used as general rules of thumb to facilitate communication and interpretation of evidential strength. Indeed, one of the merits of the Bayes factor is that it offers an assessment of evidence on a continuous scale.

figure 4

A graphical representation of a Bayes factor classification table. As the Bayes factor deviates from 1, which indicates equal support for \({\mathscr{H}}_{0}\) and \({\mathscr{H}}_{1}\) , more support is gained for either \({\mathscr{H}}_{0}\) or \({\mathscr{H}}_{1}\) . Bayes factors between 1 and 3 are considered to be weak, Bayes factors between 3 and 10 are considered moderate, and Bayes factors greater than 10 are considered strong evidence. The Bayes factors are also represented as probability wheels, where the ratio of white (i.e., support for \({\mathscr{H}}_{0}\) ) to red (i.e., support for \({\mathscr{H}}_{1}\) ) surface is a function of the Bayes factor. The probability wheels further underscore the continuous scale of evidence that Bayes factors represent. These classifications are heuristic and should not be misused as an absolute rule for all-or-nothing conclusions

When the goal of the analysis is parameter estimation, the posterior distribution is key (see Box 2). The posterior distribution is often summarized by a location parameter (point estimate) and uncertainty measure (interval estimate). For point estimation, the posterior median (reported by JASP), mean, or mode can be reported, although these do not contain any information about the uncertainty of the estimate. In order to capture the uncertainty of the estimate, an x % credible interval can be reported. The credible interval [ L , U ] has a x % probability that the true parameter lies in the interval that ranges from L to U (an interpretation that is often wrongly attributed to frequentist confidence intervals, see Morey, Hoekstra, Rouder, Lee, & Wagenmakers, 2016 ). For example, if we obtain a 95 % credible interval of [− 1,0.5] for effect size δ , we can be 95 % certain that the true value of δ lies between − 1 and 0.5, assuming that the alternative hypothesis we specify is true. In case one does not want to make this assumption, one can present the unconditional posterior distribution instead. For more discussion on this point, see Box 3.

Box 3. Conditional vs. unconditional inference.

A widely accepted view on statistical inference is neatly summarized by Fisher ( 1925 ), who states that “it is a useful preliminary before making a statistical estimate \(\dots \) to test if there is anything to justify estimation at all” (p. 300; see also Haaf, Ly, & Wagenmakers, 2019 ). In the Bayesian framework, this stance naturally leads to posterior distributions conditional on \({\mathscr{H}}_{1}\) , which ignores the possibility that the null value could be true. Generally, when we say “prior distribution” or “posterior distribution” we are following convention and referring to such conditional distributions. However, only presenting conditional posterior distributions can potentially be misleading in cases where the null hypothesis remains relatively plausible after seeing the data. A general benefit of Bayesian analysis is that one can compute an unconditional posterior distribution for the parameter using model averaging (e.g., Clyde, Ghosh, & Littman, 2011 ; Hinne, Gronau, Bergh, & Wagenmakers, 2020 ). An unconditional posterior distribution for a parameter accounts for both the uncertainty about the parameter within any one model and the uncertainty about the model itself, providing an estimate of the parameter that is a compromise between the candidate models (for more details see Hoeting, Madigan, Raftery, & Volinsky, 1999 ). In the case of a t test, which features only the null and the alternative hypothesis, the unconditional posterior consists of a mixture between a spike under \({\mathscr{H}}_{0}\) and a bell-shaped posterior distribution under \({\mathscr{H}}_{1}\) (Rouder, Haaf, & Vandekerckhove, 2018 ; van den Bergh, Haaf, Ly, Rouder, & Wagenmakers, 2019 ). Figure  5 illustrates this approach for the stereogram example.

figure 5

Updating the unconditional prior distribution to the unconditional posterior distribution for the stereogram example. The left panel shows the unconditional prior distribution, which is a mixture between the prior distributions under \({\mathscr{H}}_{0}\) and \({\mathscr{H}}_{1}\) . The prior distribution under \({\mathscr{H}}_{0}\) is a spike at the null value, indicated by the dotted line ; the prior distribution under \({\mathscr{H}}_{1}\) is a Cauchy distribution, indicated by the gray mass . The mixture proportion is determined by the prior model probabilities \(p({\mathscr{H}}_{0})\) and \(p({\mathscr{H}}_{1})\) . The right panel shows the unconditional posterior distribution, after updating the prior distribution with the data D . This distribution is a mixture between the posterior distributions under \({\mathscr{H}}_{0}\) and \({\mathscr{H}}_{1}\) ., where the mixture proportion is determined by the posterior model probabilities \(p({\mathscr{H}}_{0} \mid D)\) and \(p({\mathscr{H}}_{1} \mid D)\) . Since \(p({\mathscr{H}}_{1} \mid D) = 0.7\) (i.e., the data provide support for \({\mathscr{H}}_{1}\) over \({\mathscr{H}}_{0}\) ), about 70% of the unconditional posterior mass is comprised of the posterior mass under \({\mathscr{H}}_{1}\) , indicated by the gray mass . Thus, the unconditional posterior distribution provides information about plausible values for δ , while taking into account the uncertainty of \({\mathscr{H}}_{1}\) being true. In both panels, the dotted line and gray mass have been rescaled such that the height of the dotted line and the highest point of the gray mass reflect the prior ( left ) and posterior ( right ) model probabilities

Common pitfalls in interpreting Bayesian results

Bayesian veterans sometimes argue that Bayesian concepts are intuitive and easier to grasp than frequentist concepts. However, in our experience there exist persistent misinterpretations of Bayesian results. Here we list five:

The Bayes factor does not equal the posterior odds; in fact, the posterior odds are equal to the Bayes factor multiplied by the prior odds (see also Equation  1 ). These prior odds reflect the relative plausibility of the rival hypotheses before seeing the data (e.g., 50/50 when both hypotheses are equally plausible, or 80/20 when one hypothesis is deemed to be four times more plausible than the other). For instance, a proponent and a skeptic may differ greatly in their assessment of the prior plausibility of a hypothesis; their prior odds differ, and, consequently, so will their posterior odds. However, as the Bayes factor is the updating factor from prior odds to posterior odds, proponent and skeptic ought to change their beliefs to the same degree (assuming they agree on the model specification, including the parameter prior distributions).

Prior model probabilities (i.e., prior odds) and parameter prior distributions play different conceptual roles. Footnote 5 The former concerns prior beliefs about the hypotheses, for instance that both \({\mathscr{H}}_{0}\) and \({\mathscr{H}}_{1}\) are equally plausible a priori. The latter concerns prior beliefs about the model parameters within a model, for instance that all values of Pearson’s ρ are equally likely a priori (i.e., a uniform prior distribution on the correlation parameter). Prior model probabilities and parameter prior distributions can be combined to one unconditional prior distribution as described in Box 3 and Fig.  5 .

The Bayes factor and credible interval have different purposes and can yield different conclusions. Specifically, the typical credible interval for an effect size is conditional on \({\mathscr{H}}_{1}\) being true and quantifies the strength of an effect, assuming it is present (but see Box 3); in contrast, the Bayes factor quantifies evidence for the presence or absence of an effect. A common misconception is to conduct a “hypothesis test” by inspecting only credible intervals. Berger ( 2006 , p. 383) remarks: “[...] Bayesians cannot test precise hypotheses using confidence intervals. In classical statistics one frequently sees testing done by forming a confidence region for the parameter, and then rejecting a null value of the parameter if it does not lie in the confidence region. This is simply wrong if done in a Bayesian formulation (and if the null value of the parameter is believable as a hypothesis).”

The strength of evidence in the data is easy to overstate: a Bayes factor of 3 provides some support for one hypothesis over another, but should not warrant the confident all-or-none acceptance of that hypothesis.

The results of an analysis always depend on the questions that were asked. Footnote 6 For instance, choosing a one-sided analysis over a two-sided analysis will impact both the Bayes factor and the posterior distribution. For an illustration of this, see Fig.  6 for a comparison between one-sided and a two-sided results.

In order to avoid these and other pitfalls, we recommend that researchers who are doubtful about the correct interpretation of their Bayesian results solicit expert advice (for instance through the JASP forum at http://forum.cogsci.nl ).

For hypothesis testing, the results of the one-sided t test are presented in Fig.  6 a. The resulting BF + 0 is 4.567, indicating moderate evidence in favor of \({\mathscr{H}}_{+}\) : the data are approximately 4.6 times more likely under \({\mathscr{H}}_{+}\) than under \({\mathscr{H}}_{0}\) . To assess the robustness of this result, we also planned a Mann–Whitney U test. The resulting BF + 0 is 5.191, qualitatively similar to the Bayes factor from the parametric test. Additionally, we could have specified a multiverse analysis where data exclusion criteria (i.e., exclusion vs. no exclusion), the type of test (i.e., Mann–Whitney U vs. t test), and data transformations (i.e., log-transformed vs. raw fuse times) are varied. Typically in multiverse analyses these three decisions would be crossed, resulting in at least eight different analyses. However, in our case some of these analyses are implausible or redundant. First, because the Mann–Whitney U test is unaffected by the log transformation, the log-transformed and raw fuse times yield the same results. Second, due to the multiple assumption violations, the t test model for raw fuse times is misspecified and hence we do not trust the validity of its result. Third, we do not know which observations were excluded by (Frisby & Clatworthy, 1975 ). Consequently, only two of these eight analyses are relevant. Footnote 7 Furthermore, a more comprehensive multiverse analysis could also consider the Bayes factors from two-sided tests (i.e., BF 10 = 2.323) for the t test and BF 10 = 2.557 for the Mann–Whitney U test). However, these tests are not in line with the theory under consideration, as they answer a different theoretical question (see “Specifying the statistical model” in the Planning section).

figure 6

Bayesian two-sample t test for the parameter δ . The probability wheel on top visualizes the evidence that the data provide for the two rival hypotheses. The two gray dots indicate the prior and posterior density at the test value (Dickey & Lientz, 1970 ; Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010 ). The median and the 95 % central credible interval of the posterior distribution are shown in the top right corner. The left panel shows the one-sided procedure for hypothesis testing and the right panel shows the two-sided procedure for parameter estimation. Both figures from JASP

For parameter estimation, the results of the two-sided t test are presented in Fig.  6 a. The 95 % central credible interval for δ is relatively wide, ranging from 0.046 to 0.904: this means that, under the assumption that the effect exists and given the model we specified, we can be 95 % certain that the true value of δ lies between 0.046 to 0.904. In conclusion, there is moderate evidence for the presence of an effect, and large uncertainty about its size.

Stage 4: Reporting the results

For increased transparency, and to allow a skeptical assessment of the statistical claims, we recommend to present an elaborate analysis report including relevant tables, figures, assumption checks, and background information. The extent to which this needs to be done in the manuscript itself depends on context. Ideally, an annotated .jasp file is created that presents the full results and analysis settings. The resulting file can then be uploaded to the Open Science Framework (OSF; https://osf.io ), where it can be viewed by collaborators and peers, even without having JASP installed. Note that the .jasp file retains the settings that were used to create the reported output. Analyses not conducted in JASP should mimic such transparency, for instance through uploading an R-script. In this section, we list several desiderata for reporting, both for hypothesis testing and parameter estimation. What to include in the report depends on the goal of the analysis, regardless of whether the result is conclusive or not.

In all cases, we recommend to provide a complete description of the prior specification (i.e., the type of distribution and its parameter values) and, especially for informed priors, to provide a justification for the choices that were made. When reporting a specific analysis, we advise to refer to the relevant background literature for details. In JASP, the relevant references for specific tests can be copied from the drop-down menus in the results panel.

When the goal of the analysis is hypothesis testing, it is key to outline which hypotheses are compared by clearly stating each hypothesis and including the corresponding subscript in the Bayes factor notation. Furthermore, we recommend to include, if available, the Bayes factor robustness check discussed in the section on planning (see Fig.  7 for an example). This check provides an assessment of the robustness of the Bayes factor under different prior specifications: if the qualitative conclusions do not change across a range of different plausible prior distributions, this indicates that the analysis is relatively robust. If this plot is unavailable, the robustness of the Bayes factor can be checked manually by specifying several different prior distributions (see the mixed ANOVA analysis in the online appendix at https://osf.io/wae57/ for an example). When data come in sequentially, it may also be of interest to examine the sequential Bayes factor plot, which shows the evidential flow as a function of increasing sample size.

figure 7

The Bayes factor robustness plot. The maximum BF + 0 is attained when setting the prior width r to 0.38. The plot indicates BF + 0 for the user specified prior ( \(r = {1}/{\sqrt {2}}\) ), wide prior ( r = 1), and ultrawide prior ( \(r = \sqrt {2}\) ). The evidence for the alternative hypothesis is relatively stable across a wide range of prior distributions, suggesting that the analysis is robust. However, the evidence in favor of \({\mathscr{H}}_{+}\) is not particularly strong and will not convince a skeptic

When the goal of the analysis is parameter estimation, it is important to present a plot of the posterior distribution, or report a summary, for instance through the median and a 95 % credible interval. Ideally, the results of the analysis are reported both graphically and numerically. This means that, when possible, a plot is presented that includes the posterior distribution, prior distribution, Bayes factor, 95 % credible interval, and posterior median. Footnote 8

Numeric results can be presented either in a table or in the main text. If relevant, we recommend to report the results from both estimation and hypothesis test. For some analyses, the results are based on a numerical algorithm, such as Markov chain Monte Carlo (MCMC), which yields an error percentage. If applicable and available, the error percentage ought to be reported too, to indicate the numeric robustness of the result. Lower values of the error percentage indicate greater numerical stability of the result. Footnote 9 In order to increase numerical stability, JASP includes an option to increase the number of samples for MCMC sampling when applicable.

This is an example report of the stereograms t test example:

Here we summarize the results of the Bayesian analysis for the stereogram data. For this analysis we used the Bayesian t test framework proposed by (see also; Jeffreys, 1961 ; Rouder et al., 2009 ). We analyzed the data with JASP (JASP Team, 2019 ). An annotated .jasp file, including distribution plots, data, and input options, is available at https://osf.io/25ekj/ . Due to model misspecification (i.e., non-normality, presence of outliers, and unequal variances), we applied a log-transformation to the fuse times. This remedied the misspecification. To assess the robustness of the results, we also applied a Mann–Whitney U test. First, we discuss the results for hypothesis testing. The null hypothesis postulates that there is no difference in log fuse time between the groups and therefore \({\mathscr{H}}_{0}: \delta = 0\) . The one-sided alternative hypothesis states that only positive values of δ are possible, and assigns more prior mass to values closer to 0 than extreme values. Specifically, δ was assigned a Cauchy prior distribution with \(r ={1}/{\sqrt {2}}\) , truncated to allow only positive effect size values. Figure  6 a shows that the Bayes factor indicates evidence for \({\mathscr{H}}_{+}\) ; specifically, BF + 0 = 4.567, which means that the data are approximately 4.5 times more likely to occur under \({\mathscr{H}}_{+}\) than under \({\mathscr{H}}_{0}\) . This result indicates moderate evidence in favor of \({\mathscr{H}}_{+}\) . The error percentage is < 0.001 % , which indicates great stability of the numerical algorithm that was used to obtain the result. The Mann–Whitney U test yielded a qualitatively similar result, BF + 0 is 5.191. In order to assess the robustness of the Bayes factor to our prior specification, Fig.  7 shows BF + 0 as a function of the prior width r . Across a wide range of widths, the Bayes factor appears to be relatively stable, ranging from about 3 to 5. Second, we discuss the results for parameter estimation. Of interest is the posterior distribution of the standardized effect size δ (i.e., the population version of Cohen’s d , the standardized difference in mean fuse times). For parameter estimation, δ was assigned a Cauchy prior distribution with \(r ={1}/{\sqrt {2}}\) . Figure  6 b shows that the median of the resulting posterior distribution for δ equals 0.47 with a central 95% credible interval for δ that ranges from 0.046 to 0.904. If the effect is assumed to exist, there remains substantial uncertainty about its size, with values close to 0 having the same posterior density as values close to 1.

Limitations and challenges

The Bayesian toolkit for the empirical social scientist still has some limitations to overcome. First, for some frequentist analyses, the Bayesian counterpart has not yet been developed or implemented in JASP. Secondly, some analyses in JASP currently provide only a Bayes factor, and not a visual representation of the posterior distributions, for instance due to the multidimensional parameter space of the model. Thirdly, some analyses in JASP are only available with a relatively limited set of prior distributions. However, these are not principled limitations and the software is actively being developed to overcome these limitations. When dealing with more complex models that go beyond the staple analyses such as t tests, there exist a number of software packages that allow custom coding, such as JAGS (Plummer, 2003 ) or Stan (Carpenter et al., 2017 ). Another option for Bayesian inference is to code the analyses in a programming language such as R (R Core Team, 2018 ) or Python (van Rossum, 1995 ). This requires a certain degree of programming ability, but grants the user more flexibility. Popular packages for conducting Bayesian analyses in R are the BayesFactor package (Morey & Rouder, 2015 ) and the brms package (Bürkner, 2017 ), among others (see https://cran.r-project.org/web/views/Bayesian.html for a more exhaustive list). For Python, a popular package for Bayesian analyses is PyMC3 (Salvatier, Wiecki, & Fonnesbeck, 2016 ). The practical guidelines provided in this paper can largely be generalized to the application of these software programs.

Concluding comments

We have attempted to provide concise recommendations for planning, executing, interpreting, and reporting Bayesian analyses. These recommendations are summarized in Table  1 . Our guidelines focused on the standard analyses that are currently featured in JASP. When going beyond these analyses, some of the discussed guidelines will be easier to implement than others. However, the general process of transparent, comprehensive, and careful statistical reporting extends to all Bayesian procedures and indeed to statistical analyses across the board.

The variables are participant number, the time (in seconds) each participant needed to see the hidden figure (i.e., fuse time), experimental condition (VV = with visual information, NV = without visual information), and the log-transformed fuse time.

A one-sided alternative hypothesis makes a more risky prediction than a two-sided hypothesis. Consequently, if the data are in line with the one-sided prediction, the one-sided alternative hypothesis is rewarded with a greater gain in plausibility compared to the two-sided alternative hypothesis; if the data oppose the one-sided prediction, the one-sided alternative hypothesis is penalized with a greater loss in plausibility compared to the two-sided alternative hypothesis.

The fat-tailed Cauchy distribution is a popular default choice because it fulfills particular desiderata, see (Jeffreys, 1961 ;Liang, German, Clyde, & Berger, 2008 ; Ly et al., 2016 ; Rouder, Speckman, Sun, Morey, & Iverson, 2009 ) for details.

Specifically, the proportion of red is the posterior probability of \({\mathscr{H}}_{1}\) under a prior probability of 0.5; for a more detailed explanation and a cartoon see https://tinyurl.com/ydhfndxa

This confusion does not arise for the rarely reported unconditional distributions (see Box 3).

This is known as Jeffreys’s platitude: “The most beneficial result that I can hope for as a consequence of this work is that more attention will be paid to the precise statement of the alternatives involved in the questions asked. It is sometimes considered a paradox that the answer depends not only on the observations but on the question; it should be a platitude” (Jeffreys, 1939 , p.vi).

The Bayesian Mann–Whitney U test results and the results for the raw fuse times are in the .jasp file at https://osf.io/nw49j/ .

The posterior median is popular because it is robust to skewed distributions and invariant under smooth transformations of parameters, although other measures of central tendency, such as the mode or the mean, are also in common use.

We generally recommend error percentages below 20% as acceptable. A 20% change in the Bayes factor will result in one making the same qualitative conclusions. However, this threshold naturally increases with the magnitude of the Bayes factor. For instance, a Bayes factor of 10 with a 50% error percentage could be expected to fluctuate between 5 and 15 upon recomputation. This could be considered a large change. However, with a Bayes factor of 1000 a 50% reduction would still leave us with overwhelming evidence.

Andrews, M., & Baguley, T. (2013). Prior approval: The growth of Bayesian methods in psychology. British Journal of Mathematical and Statistical Psychology , 66 , 1–7.

Google Scholar  

Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician , 27 , 17–21.

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist , 73 , 3–25.

Berger, J. O. (2006). Bayes factors. In S. Kotz, N. Balakrishnan, C. Read, B. Vidakovic, & N. L. Johnson (Eds.) Encyclopedia of Statistical Sciences, vol. 1, 378-386, Hoboken, NJ, Wiley .

Berger, J. O., & Wolpert, R. L. (1988) The likelihood principle , (2nd edn.) Hayward (CA): Institute of Mathematical Statistics.

Bürkner, P.C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software , 80 , 1–28.

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., & et al. (2017). Stan: A probabilistic programming language. Journal of Statistical Software , 76 , 1–37.

Clyde, M. A., Ghosh, J., & Littman, M. L. (2011). Bayesian adaptive sampling for variable selection and model averaging. Journal of Computational and Graphical Statistics , 20 , 80–101.

Depaoli, S., & Schoot, R. van de (2017). Improving transparency and replication in Bayesian statistics: The WAMBS-checklist. Psychological Methods , 22 , 240–261.

PubMed   Google Scholar  

Dickey, J. M., & Lientz, B. P. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. The Annals of Mathematical Statistics , 41 , 214–226.

Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology , 5 , 781.

PubMed   PubMed Central   Google Scholar  

Draper, N. R., & Cox, D. R. (1969). On distributions and their transformation to normality. Journal of the Royal Statistical Society: Series B (Methodological) , 31 , 472–476.

Etz, A. (2018). Introduction to the concept of likelihood and its applications. Advances in Methods and Practices in Psychological Science , 1 , 60–69.

Etz, A., Haaf, J. M., Rouder, J. N., & Vandekerckhove, J. (2018). Bayesian inference and testing any hypothesis you can specify. Advances in Methods and Practices in Psychological Science , 1 (2), 281–295.

Etz, A., & Wagenmakers, E. J. (2017). J. B. S. Haldane’s contribution to the Bayes factor hypothesis test. Statistical Science , 32 , 313–329.

Fisher, R. (1925). Statistical methods for research workers, (12). Edinburgh Oliver & Boyd.

Frisby, J. P., & Clatworthy, J. L. (1975). Learning to see complex random-dot stereograms. Perception , 4 , 173–178.

Gronau, Q. F., Ly, A., & Wagenmakers, E. J. (2020). Informed Bayesian t tests. The American Statistician , 74 , 137–143.

Haaf, J., Ly, A., & Wagenmakers, E. (2019). Retire significance, but still test hypotheses. Nature , 567 (7749), 461.

Hinne, M., Gronau, Q. F., Bergh, D., & Wagenmakers, E. J. (2020). Van den A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science , 3 , 200–215.

Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: a tutorial. Statistical science, 382–401.

JASP Team (2019). JASP (Version 0.9.2)[Computer software]. https://jasp-stats.org/ .

Jarosz, A. F., & Wiley, J. (2014). What are the odds? A practical guide to computing and reporting Bayes factors. Journal of Problem Solving , 7 , 2–9.

Jeffreys, H. (1939). Theory of probability, 1st. Oxford University Press.

Jeffreys, H. (1961). Theory of probability. 3rd. Oxford University Press.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association , 90 , 773–795.

Keysers, C., Gazzola, V., & Wagenmakers, E. J. (2020). Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature Neuroscience , 23 , 788–799.

Lee, M. D., & Vanpaemel, W. (2018). Determining informative priors for cognitive models. Psychonomic Bulletin & Review , 25 , 114–127.

Liang, F., German, R. P., Clyde, A., & Berger, J. (2008). Mixtures of G priors for Bayesian variable selection. Journal of the American Statistical Association , 103 , 410–424.

Lo, S., & Andrews, S. (2015). To transform or not to transform: Using generalized linear mixed models to analyse reaction time data. Frontiers in Psychology , 6 , 1171.

Ly, A., Verhagen, A. J., & Wagenmakers, E. J. (2016). Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology , 72 , 19–32.

Marsman, M., & Wagenmakers, E. J. (2017). Bayesian benefits with JASP. European Journal of Developmental Psychology , 14 , 545–555.

Matzke, D., Nieuwenhuis, S., van Rijn, H., Slagter, H. A., van der Molen, M. W., & Wagenmakers, E. J. (2015). The effect of horizontal eye movements on free recall: A preregistered adversarial collaboration. Journal of Experimental Psychology: General , 144 , e1–e15.

Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E. J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review , 23 , 103–123.

Morey, R. D., & Rouder, J. N. (2015). BayesFactor 0.9.11-1. Comprehensive R Archive Network. http://cran.r-project.org/web/packages/BayesFactor/index.html .

Plummer, M. (2003). JAGS: A Program for analysis of Bayesian graphical models using Gibbs sampling. In K. Hornik, F. Leisch, & A. Zeileis (Eds.) Proceedings of the 3rd international workshop on distributed statistical computing, Vienna, Austria .

R Core Team (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. https://www.R-project.org/ .

Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin & Review , 21 , 301–308.

Rouder, J. N., Haaf, J. M., & Vandekerckhove, J. (2018). Bayesian inference for psychology, part IV: Parameter estimation and Bayes factors. Psychonomic Bulletin & Review , 25 (1), 102–113.

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review , 16 , 225– 237.

Salvatier, J., Wiecki, T. V., & Fonnesbeck, C. (2016). Probabilistic programming in Python using pyMC. PeerJ Computer Science , 3 (2), e55.

Schönbrodt, F.D., & Wagenmakers, E. J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review , 25 , 128–142.

Schramm, P., & Rouder, J. N. (2019). Are reaction time transformations really beneficial? PsyArXiv, March 5.

Spiegelhalter, D. J., Myles, J. P., Jones, D. R., & Abrams, K. R. (2000). Bayesian methods in health technology assessment: a review. Health Technology Assessment , 4 , 1–130.

Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science , 11 , 702–712.

Stefan, A. M., Gronau, Q. F., Schönbrodt, F.D., & Wagenmakers, E. J. (2019). A tutorial on Bayes factor design analysis using an informed prior. Behavior Research Methods , 51 , 1042–1058.

Sung, L., Hayden, J., Greenberg, M. L., Koren, G., Feldman, B. M., & Tomlinson, G. A. (2005). Seven items were identified for inclusion when reporting a Bayesian analysis of a clinical study. Journal of Clinical Epidemiology , 58 , 261–268.

The BaSiS group (2001). Bayesian standards in science: Standards for reporting of Bayesian analyses in the scientific literature. Internet. http://lib.stat.cmu.edu/bayesworkshop/2001/BaSis.html .

Tijmstra, J. (2018). Why checking model assumptions using null hypothesis significance tests does not suffice: a plea for plausibility. Psychonomic Bulletin & Review , 25 , 548–559.

Vandekerckhove, J., Rouder, J. N., & Kruschke, J. K. (eds.) (2018). Beyond the new statistics: Bayesian inference for psychology [special issue]. Psychonomic Bulletin & Review , p 25.

Wagenmakers, E. J., Beek, T., Rotteveel, M., Gierholz, A., Matzke, D., Steingroever, H., & et al. (2015). Turning the hands of time again: A purely confirmatory replication study and a Bayesian analysis. Frontiers in Psychology: Cognition , 6 , 494.

Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive Psychology , 60 , 158–189.

Wagenmakers, E. J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., & et al. (2018). Bayesian inference for psychology. Part II: Example applications with JASP. Psychonomic Bulletin &, Review , 25 , 58–76.

Wagenmakers, E. J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., & et al. (2018). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulletin &, Review , 25 , 35–57.

Wagenmakers, E. J., Morey, R. D., & Lee, M. D. (2016). Bayesian benefits for the pragmatic researcher. Current Directions in Psychological Science , 25 , 169–176.

Wetzels, R., Raaijmakers, J. G. W., Jakab, E., & Wagenmakers, E. J. (2009). How to quantify support for and against the null hypothesis: A flexible winBUGS implementation of a default Bayesian t test. Psychonomic Bulletin & Review , 16 , 752– 760.

Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology , 7 , 1832.

Wrinch, D., & Jeffreys, H. (1921). On certain fundamental principles of scientific inquiry. Philosophical Magazine , 42 , 369– 390.

van Doorn, J., Ly, A., Marsman, M., & Wagenmakers, E. J. (2020). Bayesian rank-based hypothesis testing for the rank sum test, the signed rank test, and spearman’s rho. Journal of Applied Statistics , 1–23.

van Rossum, G. (1995). Python tutorial (Tech. Rep. No. CS-R9526). Amsterdam: Centrum voor Wiskunde en Informatica (CWI).

van den Bergh, D., Haaf, J. M., Ly, A., Rouder, J. N., & Wagenmakers, E. J. (2019). A cautionary note on estimating effect size. PsyArXiv. Retrieved from psyarxiv.com/h6pr8 .

Download references

Acknowledgments

We thank Dr. Simons, two anonymous reviewers, and the editor for comments on an earlier draft. Correspondence concerning this article may be addressed to Johnny van Doorn, University of Amsterdam, Department of Psychological Methods, Valckeniersstraat 59, 1018 XA Amsterdam, the Netherlands. E-mail may be sent to [email protected]. This work was supported in part by a Vici grant from the Netherlands Organization of Scientific Research (NWO) awarded to EJW (016.Vici.170.083) and an advanced ERC grant awarded to EJW (743086 UNIFY). DM is supported by a Veni Grant (451-15-010) from the NWO. MM is supported by a Veni Grant (451-17-017) from the NWO. AE is supported by a National Science Foundation Graduate Research Fellowship (DGE1321846). Centrum Wiskunde & Informatica (CWI) is the national research institute for mathematics and computer science in the Netherlands.

Author information

Authors and affiliations.

University of Amsterdam, Amsterdam, Netherlands

Johnny van Doorn, Don van den Bergh, Udo Böhm, Fabian Dablander, Tim Draws, Nathan J. Evans, Quentin F. Gronau, Julia M. Haaf, Max Hinne, Šimon Kucharský, Alexander Ly, Maarten Marsman, Dora Matzke, Akash R. Komarlu Narendra Gupta, Alexandra Sarafoglou, Angelika Stefan & Eric-Jan Wagenmakers

Nyenrode Business University, Breukelen, Netherlands

University of California, Irvine, California, USA

Alexander Etz

Centrum Wiskunde & Informatica, Amsterdam, Netherlands

Alexander Ly

Stanford University, Stanford, California, USA

Jan G. Voelkel

You can also search for this author in PubMed   Google Scholar

Contributions

JvD wrote the main manuscript. EJW, AE, JH, and JvD contributed to manuscript revisions. All authors reviewed the manuscript and provided feedback.

Corresponding author

Correspondence to Johnny van Doorn .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Practices Statement

The data and materials are available at https://osf.io/nw49j/ .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

van Doorn, J., van den Bergh, D., Böhm, U. et al. The JASP guidelines for conducting and reporting a Bayesian analysis. Psychon Bull Rev 28 , 813–826 (2021). https://doi.org/10.3758/s13423-020-01798-5

Download citation

Published : 09 October 2020

Issue Date : June 2021

DOI : https://doi.org/10.3758/s13423-020-01798-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bayesian inference
  • Scientific reporting
  • Statistical software
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 14 January 2021

Bayesian statistics and modelling

Nature Reviews Methods Primers volume  1 , Article number:  3 ( 2021 ) Cite this article

8214 Accesses

10 Citations

40 Altmetric

Metrics details

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Determinants and spatial factors of anemia in women of reproductive age in Democratic Republic of Congo (drc): a Bayesian multilevel ordinal logistic regression model approach

  • Martin Abysina Soda
  • , Eugénie Kabali Hamuli
  •  …  Ngianga-Bakwin Kandala

BMC Public Health Open Access 17 January 2024

Long-term continuous monitoring of methane emissions at an oil and gas facility using a multi-open-path laser dispersion spectrometer

  • Rutger IJzermans
  • , Matthew Jones
  •  …  David Randell

Scientific Reports Open Access 05 January 2024

Bayesian Analysis Reporting Guidelines

  • John K. Kruschke

Nature Human Behaviour Open Access 16 August 2021

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 1 digital issues and online access to articles

92,52 € per year

only 92,52 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Bayesian statistics are an approach to data analysis based on Bayes’ Theorem. This PrimeView provides an overview of how to select and establish priors, likelihoods, and posteriors. The use of Bayesian statistics and modelling across disciplines is discussed, along with appropriate sharing of data acquired using Bayesian methods.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Bayesian statistics and modelling. Nat Rev Methods Primers 1 , 3 (2021). https://doi.org/10.1038/s43586-020-00003-0

Download citation

Published : 14 January 2021

DOI : https://doi.org/10.1038/s43586-020-00003-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research paper on bayesian statistics

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

mathematics-logo

Journal Menu

  • Mathematics Home
  • Aims & Scope
  • Editorial Board
  • Reviewer Board
  • Topical Advisory Panel
  • Instructions for Authors
  • Special Issues
  • Sections & Collections
  • Article Processing Charge
  • Indexing & Archiving
  • Editor’s Choice Articles
  • Most Cited & Viewed
  • Journal Statistics
  • Journal History
  • Journal Awards
  • Society Collaborations
  • Conferences
  • Editorial Office

Journal Browser

  • arrow_forward_ios Forthcoming issue arrow_forward_ios Current issue
  • Vol. 12 (2024)
  • Vol. 11 (2023)
  • Vol. 10 (2022)
  • Vol. 9 (2021)
  • Vol. 8 (2020)
  • Vol. 7 (2019)
  • Vol. 6 (2018)
  • Vol. 5 (2017)
  • Vol. 4 (2016)
  • Vol. 3 (2015)
  • Vol. 2 (2014)
  • Vol. 1 (2013)

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

Research Progress and Application of Bayesian Statistics

  • Print Special Issue Flyer
  • Special Issue Editors

Special Issue Information

Benefits of publishing in a special issue.

  • Published Papers

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section " Probability and Statistics ".

Deadline for manuscript submissions: closed (30 May 2024) | Viewed by 2988

Share This Special Issue

Special issue editor.

research paper on bayesian statistics

Dear Colleagues,

The Bayesian method provides a natural and coherent framework for statistical inference and prediction. Its usage was, however, limited in the past due to computational complexity. During the past few decades, there has been an increasing interest in Bayesian statistical modeling thanks to advances in computing capabilities and estimation methods such as Markov chain Monte Carlo (MCMC) and Integrated Nested Laplace Approximation (INLA). 

This Special Issue aims to provide a collection of papers highlighting recent advances in theories and applications using Bayesian statistics. Bayesian methods have been widely used across different disciplines, and as such, this Special Issue welcomes contributions from different fields, such as medicine, epidemiology, engineering, economics, and business. Submissions can be in the form of original research or reviews. Examples of areas of interest include but are not limited to missing data handling; variable/feature selection; time-series analysis; comparison of different estimation methods (e.g., variants of MCMC methods and INLA); comparison between Bayesian and classical methods using real-world examples; analysis of big data. 

Dr. Chao Wang Guest Editor

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website . Once you are registered, click here to go to the submission form . Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

  • Bayesian statistics
  • Markov chain Monte Carlo
  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here .

Published Papers (2 papers)

research paper on bayesian statistics

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Bayesian statistics in medical research: an intuitive alternative to conventional data analysis

Affiliation.

  • 1 Women and Infants Research Foundation, King Edward Memorial Hospital, Subiaco, Perth, Australia.
  • PMID: 10970013
  • DOI: 10.1046/j.1365-2753.2000.00216.x

Statistical analysis of both experimental and observational data is central to medical research. Unfortunately, the process of conventional statistical analysis is poorly understood by many medical scientists. This is due, in part, to the counter-intuitive nature of the basic tools of traditional (frequency-based) statistical inference. For example, the proper definition of a conventional 95% confidence interval is quite confusing. It is based upon the imaginary results of a series of hypothetical repetitions of the data generation process and subsequent analysis. Not surprisingly, this formal definition is often ignored and a 95% confidence interval is widely taken to represent a range of values that is associated with a 95% probability of containing the true value of the parameter being estimated. Working within the traditional framework of frequency-based statistics, this interpretation is fundamentally incorrect. It is perfectly valid, however, if one works within the framework of Bayesian statistics and assumes a 'prior distribution' that is uniform on the scale of the main outcome variable. This reflects a limited equivalence between conventional and Bayesian statistics that can be used to facilitate a simple Bayesian interpretation based on the results of a standard analysis. Such inferences provide direct and understandable answers to many important types of question in medical research. For example, they can be used to assist decision making based upon studies with unavoidably low statistical power, where non-significant results are all too often, and wrongly, interpreted as implying 'no effect'. They can also be used to overcome the confusion that can result when statistically significant effects are too small to be clinically relevant. This paper describes the theoretical basis of the Bayesian-based approach and illustrates its application with a practical example that investigates the prevalence of major cardiac defects in a cohort of children born using the assisted reproduction technique known as ICSI (intracytoplasmic sperm injection).

PubMed Disclaimer

Similar articles

  • Clinical significance not statistical significance: a simple Bayesian alternative to p values. Burton PR, Gurrin LC, Campbell MJ. Burton PR, et al. J Epidemiol Community Health. 1998 May;52(5):318-23. doi: 10.1136/jech.52.5.318. J Epidemiol Community Health. 1998. PMID: 9764283 Free PMC article.
  • Bayesian analysis: a new statistical paradigm for new technology. Grunkemeier GL, Payne N. Grunkemeier GL, et al. Ann Thorac Surg. 2002 Dec;74(6):1901-8. doi: 10.1016/s0003-4975(02)04535-6. Ann Thorac Surg. 2002. PMID: 12643371
  • [Count on your beliefs. Bayes--not the P value--measures credence]. Taube A, Malmquist J. Taube A, et al. Lakartidningen. 2001 Jul 11;98(28-29):3208-11. Lakartidningen. 2001. PMID: 11496808 Swedish.
  • Bayesian methods in health technology assessment: a review. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR. Spiegelhalter DJ, et al. Health Technol Assess. 2000;4(38):1-130. Health Technol Assess. 2000. PMID: 11134920 Review.
  • [Bayesian statistics in medicine. Part I: The basic tools]. Nichelatti M, Montomoli C. Nichelatti M, et al. G Ital Nefrol. 2008 May-Jun;25(3):342-6. G Ital Nefrol. 2008. PMID: 18473305 Review. Italian.
  • Overlapped Bayesian spatio-temporal models to detect crime spots and their possible risk factors based on the Opole Province, Poland, in the years 2015-2019. Drozdowski R, Wielki R, Tukiendorf A. Drozdowski R, et al. Crime Sci. 2023;12(1):10. doi: 10.1186/s40163-023-00189-0. Epub 2023 May 22. Crime Sci. 2023. PMID: 37250980 Free PMC article.
  • Evaluating the performance of Bayesian and frequentist approaches for longitudinal modeling: application to Alzheimer's disease. Pérez-Millan A, Contador J, Tudela R, Niñerola-Baizán A, Setoain X, Lladó A, Sánchez-Valle R, Sala-Llonch R. Pérez-Millan A, et al. Sci Rep. 2022 Aug 24;12(1):14448. doi: 10.1038/s41598-022-18129-4. Sci Rep. 2022. PMID: 36002550 Free PMC article.
  • An Early Stage Researcher's Primer on Systems Medicine Terminology. Zanin M, Aitya NAA, Basilio J, Baumbach J, Benis A, Behera CK, Bucholc M, Castiglione F, Chouvarda I, Comte B, Dao TT, Ding X, Pujos-Guillot E, Filipovic N, Finn DP, Glass DH, Harel N, Iesmantas T, Ivanoska I, Joshi A, Boudjeltia KZ, Kaoui B, Kaur D, Maguire LP, McClean PL, McCombe N, de Miranda JL, Moisescu MA, Pappalardo F, Polster A, Prasad G, Rozman D, Sacala I, Sanchez-Bornot JM, Schmid JA, Sharp T, Solé-Casals J, Spiwok V, Spyrou GM, Stalidzans E, Stres B, Sustersic T, Symeonidis I, Tieri P, Todd S, Van Steen K, Veneva M, Wang DH, Wang H, Wang H, Watterson S, Wong-Lin K, Yang S, Zou X, Schmidt HHHW. Zanin M, et al. Netw Syst Med. 2021 Feb 25;4(1):2-50. doi: 10.1089/nsm.2020.0003. eCollection 2021 Feb. Netw Syst Med. 2021. PMID: 33659919 Free PMC article. Review.
  • Risk Stratification of Older Adults Who Present to the Emergency Department With Syncope: The FAINT Score. Probst MA, Gibson T, Weiss RE, Yagapen AN, Malveau SE, Adler DH, Bastani A, Baugh CW, Caterino JM, Clark CL, Diercks DB, Hollander JE, Nicks BA, Nishijima DK, Shah MN, Stiffler KA, Storrow AB, Wilber ST, Sun BC. Probst MA, et al. Ann Emerg Med. 2020 Feb;75(2):147-158. doi: 10.1016/j.annemergmed.2019.08.429. Epub 2019 Oct 23. Ann Emerg Med. 2020. PMID: 31668571 Free PMC article.
  • Factors Associated With Bites to a Child From a Dog Living in the Same Home: A Bi-National Comparison. Messam LLM, Kass PH, Chomel BB, Hart LA. Messam LLM, et al. Front Vet Sci. 2018 May 4;5:66. doi: 10.3389/fvets.2018.00066. eCollection 2018. Front Vet Sci. 2018. PMID: 29780810 Free PMC article.

Publication types

  • Search in MeSH

Related information

  • Cited in Books

LinkOut - more resources

Full text sources.

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Download PDF
  • Share X Facebook Email LinkedIn
  • Permissions

Bayesian Hierarchical Models

  • 1 Berry Consultants LLC, Austin, Texas
  • Editorial Time for Clinicians to Embrace Their Inner Bayesian? Roger J. Lewis, MD, PhD; Derek C. Angus, MD, MPH, FRCP JAMA
  • Special Communication ECMO for ARDS: Bayesian Analysis and Posterior Probability of Mortality Benefit Ewan C. Goligher, MD, PhD; George Tomlinson, PhD; David Hajage, MD, PhD; Duminda N. Wijeysundera, MD, PhD; Eddy Fan, MD, PhD; Peter Jüni, MD; Daniel Brodie, MD; Arthur S. Slutsky, MD; Alain Combes, MD, PhD JAMA
  • Original Investigation Effect of Mexiletine on Muscle Stiffness in Patients With Nondystrophic Myotonia Bas C. Stunnenberg, MD; Joost Raaphorst, MD, PhD; Hans M. Groenewoud, MSc; Jeffrey M. Statland, MD; Robert C. Griggs, MD; Willem Woertman, PhD; Dick F. Stegeman, PhD; Janneke Timmermans, MD; Jaya Trivedi, MD; Emma Matthews, MD, PhD; Christiaan G. J. Saris, MD, PhD; Bas J. Schouwenberg, MD; Gea Drost, MD, PhD; Baziel G. M. van Engelen, MD, PhD; Gert Jan van der Wilt, PhD JAMA
  • Viewpoint Incorporating Adult Evidence Into Pediatric Research and Practice Srinivas Murthy, MD, CM, MHSc; Patricia Fontela, MD; Scott Berry, PhD JAMA

Treatment effects will differ from one study to another evaluating similar therapies, both because of random variation between individual patients and owing to true differences that exist because of other differences, including inclusion criteria and temporal trends. The sources of variability have many levels; one level involves the random differences between individual patients, and another level involves the systematic differences that exist between studies. This multilevel or hierarchical information occurs in many research settings, such as in cluster-randomized trials and meta-analyses. 1 , 2 Sources of variation can be better understood and quantified if treatment effect estimates from each individual study are examined in relation to the totality of information available in all the studies.

  • Editorial Time for Clinicians to Embrace Their Inner Bayesian? JAMA

Read More About

McGlothlin AE , Viele K. Bayesian Hierarchical Models. JAMA. 2018;320(22):2365–2366. doi:10.1001/jama.2018.17977

Manage citations:

© 2024

Artificial Intelligence Resource Center

Cardiology in JAMA : Read the Latest

Browse and subscribe to JAMA Network podcasts!

Others Also Liked

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing
  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Statistics > Methodology

Title: generative bayesian modeling with implicit priors.

Abstract: Generative models are a cornerstone of Bayesian data analysis, enabling predictive simulations and model validation. However, in practice, manually specified priors often lead to unreasonable simulation outcomes, a common obstacle for full Bayesian simulations. As a remedy, we propose to add small portions of real or simulated data, which creates implicit priors that improve the stability of Bayesian simulations. We formalize this approach, providing a detailed process for constructing implicit priors and relating them to existing research in this area. We also integrate the implicit priors into simulation-based calibration, a prominent Bayesian simulation task. Through two case studies, we demonstrate that implicit priors quickly improve simulation stability and model calibration. Our findings provide practical guidelines for implementing implicit priors in various Bayesian modeling scenarios, showcasing their ability to generate more reliable simulation outcomes.
Subjects: Methodology (stat.ME)
Cite as: [stat.ME]
  (or [stat.ME] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Subscribe to the PwC Newsletter

Join the community, edit social preview.

research paper on bayesian statistics

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row.

TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK REMOVE

Remove a task

Add a method, remove a method, edit datasets, advances in bayesian model selection consistency for high-dimensional generalized linear models.

8 Aug 2024  ·  Jeyong Lee , Minwoo Chae , Ryan Martin · Edit social preview

Uncovering genuine relationships between a response variable of interest and a large collection of covariates is a fundamental and practically important problem. In the context of Gaussian linear models, both the Bayesian and non-Bayesian literature is well-developed and there are no substantial differences in the model selection consistency results available from the two schools. For the more challenging generalized linear models (GLMs), however, Bayesian model selection consistency results are lacking in several ways. In this paper, we construct a Bayesian posterior distribution using an appropriate data-dependent prior and develop its asymptotic concentration properties using new theoretical techniques. In particular, we leverage Spokoiny's powerful non-asymptotic theory to obtain sharp quadratic approximations of the GLM's log-likelihood function, which leads to tight bounds on the errors associated with the model-specific maximum likelihood estimators and the Laplace approximation of our Bayesian marginal likelihood. In turn, these improved bounds lead to significantly stronger, near-optimal Bayesian model selection consistency results, e.g., far weaker beta-min conditions, compared to those available in the existing literature. In particular, our results are applicable to the Poisson regression model, in which the score function is not sub-Gaussian.

Code Edit Add Remove Mark official

research paper on bayesian statistics

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Bayesian statistics

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Last »
  • Bayesian Analysis Follow Following
  • Bayesian Inference Follow Following
  • Bayesian Models Follow Following
  • Bayesian Follow Following
  • Statistics Follow Following
  • Sequential Monte Carlo Follow Following
  • Applied Bayesian Statistics Follow Following
  • Particle Learning Follow Following
  • Particle Filtering Follow Following
  • Bayesian statistics & modelling Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Journals
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Wiley Open Access Collection

Logo of blackwellopen

A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research

Rens van de schoot.

Utrecht University and North-West University

David Kaplan

University of Tilburg

Jaap Denissen

Friedrich Schiller University of Jena

Jens B Asendorpf

University of Wisconsin–Madison

Franz J Neyer

Humboldt-University Berlin

Marcel AG van Aken

Utrecht University

Associated Data

Appendix S1. Bayes Theorem in More Details.

Appendix S2. Bayesian Statistics in Mplus.

Appendix S3. Bayesian Satistics in WinBugs.

Appendix S4. Bayesian Statistics in AMOS.

Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are introduced using a simplified example. Thereafter, the advantages and pitfalls of the specification of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in a second example a series of studies that examine the theoretical framework of dynamic interactionism are considered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, and guidelines on how to report on Bayesian statistics are provided.

… it is clear that it is not possible to think about learning from experience and acting on it without coming to terms with Bayes’ theorem . Jerome Cornfield (in De Finetti, 1974a )

In this study, we provide a gentle introduction to Bayesian analysis and the Bayesian terminology without the use of formulas. We show why it is attractive to adopt a Bayesian perspective and, more practically, how to estimate a model from a Bayesian perspective using background knowledge in the actual data analysis and how to interpret the results. Many developmental researchers might never have heard of Bayesian statistics, or if they have, they most likely have never used it for their own data analysis. However, Bayesian statistics is becoming more common in social and behavioral science research. As stated by Kruschke ( 2011a ), in a special issue of Perspectives on Psychological Science :

whereas the 20th century was dominated by NHST [null hypothesis significance testing], the 21st century is becoming Bayesian. (p. 272)

Bayesian methods are also slowly becoming used in developmental research. For example, a number of Bayesian articles have been published in Child Development ( n = 5), Developmental Psychology ( n = 7), and Development and Psychopathology ( n = 5) in the last 5 years (e.g., Meeus, Van de Schoot, Keijsers, Schwartz, & Branje, 2010 ; Rowe, Raudenbush, & Goldin-Meadow, 2012 ). The increase in Bayesian applications is not just taking place in developmental psychology but also in psychology in general. This increase is specifically due to the availability of Bayesian computational methods in popular software packages such as Amos (Arbuckle, 2006 ), Mplus v6 ( Muthén & Muthén, 1998–2012 ; for the Bayesian methods in Mplus see Kaplan & Depaoli, 2012 ; Muthén & Asparouhov, 2012 ), WinBUGS (Lunn, Thomas, Best, & Spiegelhalter, 2000 ), and a large number of packages within the R statistical computing environment (Albert, 2009 ).

Of specific concern to substantive researchers, the Bayesian paradigm offers a very different view of hypothesis testing (e.g., Kaplan & Depaoli, 2012 , 2013 ; Walker, Gustafson, & Frimer, 2007 ; Zhang, Hamagami, Wang, Grimm, & Nesselroade, 2007 ). Specifically, Bayesian approaches allow researchers to incorporate background knowledge into their analyses instead of testing essentially the same null hypothesis over and over again, ignoring the lessons of previous studies. In contrast, statistical methods based on the frequentist (classical) paradigm (i.e., the default approach in most software) often involve testing the null hypothesis. In plain terms, the null hypothesis states that “nothing is going on.” This hypothesis might be a bad starting point because, based on previous research, it is almost always expected that “something is going on.” Replication is an important and indispensible tool in psychology in general (Asendorpf et al., 2013 ), and Bayesian methods fit within this framework because background knowledge is integrated into the statistical model. As a result, the plausibility of previous research findings can be evaluated in relation to new data, which makes the proposed method an interesting tool for confirmatory strategies.

The organization of this study is as follows: First, we discuss probability in the frequentist and Bayesian framework, followed by a description, in general terms, of the essential ingredients of a Bayesian analysis using a simple example. To illustrate Bayesian inference, we reanalyze a series of studies on the theoretical framework of dynamic interactionism where individuals are believed to develop through a dynamic and reciprocal transaction between personality and the environment. Thereby, we apply the Bayesian approach to a structural equation modeling (SEM) framework within an area of developmental psychology where theory building and replication play a strong role. We conclude with a discussion of the advantages of adopting a Bayesian point of view in the context of developmental research. In the online supporting information appendices we provide an introduction to the computational machinery of Bayesian statistics, and we provide annotated syntax for running Bayesian analysis using Mplus, WinBugs, and Amos in our online supporting information appendices.

Probability

Most researchers recognize the important role that statistical analyses play in addressing research questions. However, not all researchers are aware of the theories of probability that underlie model estimation, as well as the practical differences between these theories. These two theories are referred to as the frequentist paradigm and the subjective probability paradigm .

Conventional approaches to developmental research derive from the frequentist paradigm of statistics, advocated mainly by R. A. Fisher, Jerzy Neyman, and Egon Pearson. This paradigm associates probability with long-run frequency. The canonical example of long-run frequency is the notion of an infinite coin toss. A sample space of possible outcomes (heads and tails) is enumerated, and probability is the proportion of the outcome (say heads) over the number of coin tosses.

The Bayesian paradigm, in contrast, interprets probability as the subjective experience of uncertainty (De Finetti, 1974b ). Bayes’ theorem is a model for learning from data, as suggested in the Cornfield quote at the beginning of this study. In this paradigm, the classic example of the subjective experience of uncertainty is the notion of placing a bet. Here, unlike with the frequentist paradigm, there is no notion of infinitely repeating an event of interest. Rather, placing a bet—for example, on a baseball game or horse race—involves using as much prior information as possible as well as personal judgment. Once the outcome is revealed, then prior information is updated. This is the model of learning from experience (data) that is the essence of the Cornfield quote at the beginning of this study. Table ​ Table1 1 provides an overview of similarities and differences between frequentist and Bayesian statistics.

Overview of the Similarities and Differences Between Frequentist and Bayesian Statistics

Frequentist statisticsBayesian statistics
Definition of the valueThe probability of observing the same or more extreme data assuming that the null hypothesis is true in the populationThe probability of the (null) hypothesis
Large samples needed?Usually, when normal theory-based methods are usedNot necessarily
Inclusion of prior knowledge possible?NoYes
Nature of the parameters in the modelUnknown but fixedUnknown and therefore random
Population parameterOne true valueA distribution of values reflecting uncertainty
Uncertainty is defined byThe sampling distribution based on the idea of infinite repeated samplingProbability distribution for the population parameter
Estimated intervalsConfidence interval: Over an infinity of samples taken from the population, 95% of these contain the true population valueCredibility interval: A 95% probability that the population value is within the limits of the interval

The goal of statistics is to use the data to say something about the population. In estimating, for example, the mean of some variable in a population, the mean of the sample data is a “statistic” (i.e., estimated mean) and the unknown population mean is the actual parameter of interest. Similarly, the regression coefficients from a regression analysis remain unknown parameters estimated from data. We refer to means, regression coefficients, residual variances, and so on as unknown parameters in a model. Using software like SPSS, Amos, or Mplus, these unknown parameters can be estimated. One can choose the type of estimator for the computation, for example, maximum likelihood (ML) estimation or Bayesian estimation.

The key difference between Bayesian statistical inference and frequentist (e.g., ML estimation) statistical methods concerns the nature of the unknown parameters. In the frequentist framework, a parameter of interest is assumed to be unknown, but fixed. That is, it is assumed that in the population there is only one true population parameter, for example, one true mean or one true regression coefficient. In the Bayesian view of subjective probability, all unknown parameters are treated as uncertain and therefore should be described by a probability distribution.

The Ingredients of Bayesian Statistics

There are three essential ingredients underlying Bayesian statistics first described by T. Bayes in 1774 (Bayes & Price, 1763 ; Stigler, 1986 ). Briefly, these ingredients can be described as follows (these will be explained in more detail in the following sections).

The first ingredient is the background knowledge on the parameters of the model being tested. This first ingredient refers to all knowledge available before seeing the data and is captured in the so-called prior distribution , for example, a normal distribution. The variance of this prior distribution reflects our level of uncertainty about the population value of the parameter of interest: The larger the variance, the more uncertain we are. The prior variance is expressed as precision , which is simply the inverse of the variance. The smaller the prior variance, the higher the precision, and the more confident one is that the prior mean reflects the population mean. In this study we will vary the specification of the prior distribution to evaluate its influence on the final results.

The second ingredient is the information in the data themselves. It is the observed evidence expressed in terms of the likelihood function of the data given the parameters. In other words, the likelihood function asks:

Given a set of parameters, such as the mean and/or the variance, what is the likelihood or probability of the data in hand?

The third ingredient is based on combining the first two ingredients, which is called posterior inference . Both (1) and (2) are combined via Bayes' theorem (described in more detail in the online Appendix S1 ) and are summarized by the so-called posterior distribution, which is a compromise of the prior knowledge and the observed evidence. The posterior distribution reflects one's updated knowledge, balancing prior knowledge with observed data.

These three ingredients constitute Bayes' theorem, which states, in words, that our updated understanding of parameters of interest given our current data depends on our prior knowledge about the parameters of interest weighted by the current evidence given those parameters of interest. In online Appendix S1 we elaborate on the theorem. In what follows, we will explain Bayes' theorem and its three ingredients in detail.

Prior Knowledge

Why define prior knowledge.

The key epistemological reason concerns our view that progress in science generally comes about by learning from previous research findings and incorporating information from these research findings into our present studies. Often information gleaned from previous research is incorporated into our choice of designs, variables to be measured, or conceptual diagrams to be drawn. With the Bayesian methodology our prior beliefs are made explicit, and are moderated by the actual data in hand. (Kaplan & Depaoli, 2013 , p. 412)

How to Define Prior Knowledge?

The data we have in our hands moderate our prior beliefs regarding the parameters and thus lead to updated beliefs. But how do we specify priors? The choice of a prior is based on how much information we believe we have prior to the data collection and how accurate we believe that information to be. There are roughly two scenarios. First, in some cases we may not be in possession of enough prior information to aid in drawing posterior inferences. From a Bayesian point of view, this lack of information is still important to consider and incorporate into our statistical specifications.

In other words, it is equally important to quantify our ignorance as it is to quantify our cumulative understanding of a problem at hand. (Kaplan & Depaoli, 2013 , p. 412)

Second, in some cases we may have considerable prior information regarding the value of a parameter and our sense of the accuracy around that value. For example, after decades of research on the relation between, say, parent socioeconomic status and student achievement, we may, with a bit of effort, be able to provide a fairly accurate prior distribution on the parameter that measures that relation. Prior information can also be obtained from meta-analyses and also previous waves of surveys. These sources of information regarding priors are “objective” in the sense that others can verify the source of the prior information. This should not be confused with the notion of “objective priors,” which constitute pure ignorance of background knowledge. Often the so-called uniform distribution is used to express an objective prior. For some subjective Bayesians, priors can come from any source: objective or otherwise. The issue just described is referred to as the “elicitation problem” and has been nicely discussed in O'Hagan et al. ( 2006 ; see also Rietbergen, Klugkist, Janssen, Moons, & Hoijtink, 2011 ; Van Wesel, 2011 ). If one is unsure about the prior distribution, a sensitivity analysis is recommended (e.g., Gelman, Carlin, Stern, & Rubin, 2004 ). In such an analysis, the results of different prior specifications are compared to inspect the influence of the prior. We will demonstrate sensitivity analyses in our examples.

Let us use a very simple example to introduce the prior specification. We will only estimate two parameters: the mean and variance of reading skills, for example, measured at entry to kindergarten for children in a state-funded prekindergarten program. To introduce the Bayesian methodology, we will first focus on this extremely simple case, and only thereafter will we consider a more complex (and often more realistic) example. In online Appendices S2–S4 we provide the syntax for analyzing this example using Mplus, WinBugs, and Amos.

The prior reflects our knowledge about the mean reading skills score before observing the current data. Different priors can be constructed reflecting different types of prior knowledge. Throughout the study we will use different priors with different levels of subjectivity to illustrate the effects of using background knowledge. In the section covering our real-life example, we base our prior specification on previous research results, but in the current section we discuss several hypothetical prior specifications. In Figure ​ Figure1, 1 , six different distributions of possible reading skills scores are displayed representing degrees of prior knowledge. These distributions could reflect expert knowledge and/or results from previous similar research studies or meta-analyses.

An external file that holds a picture, illustration, etc.
Object name is cdev0085-0842-f1.jpg

A priori beliefs about the distribution of reading skills scores in the population.

Noninformative Prior Distributions

In Figure ​ Figure1a, 1 a, it is assumed that we do not know anything about mean reading skills score and every value of the mean reading skills score in our data between minus infinity and plus infinity is equally likely. Such a distribution is often referred to as a noninformative prior distribution .

A frequentist analysis of the problem would ignore our accumulated knowledge and let the data speak for themselves—as if there has never been any prior research on reading skills. However, it could be reasonably argued that empirical psychology has accumulated a considerable amount of empirical information about the distribution of reading skills scores in the population.

Informative Prior Distributions

From a Bayesian perspective, it seems natural to incorporate what has been learned so far into our analysis. This implies that we specify that the mean of reading skills mean has a specific distribution. The parameters of the prior distribution are referred to as hyperparameters . If for the mean reading score a normal distribution is specified for the prior distribution, the hyperparameters are the prior mean and the prior precision. Thus, based on previous research one can specify the expected prior mean. If reading skills are assessed by a standardized test with a mean of 100, we hypothesize that reading skills scores close to 100 are more likely to occur in our data than values further away from 100, but every value in the entire range between minus and plus infinity is still allowed.

Also, the prior precision needs to be specified, which reflects the certainty or uncertainty about the prior mean. The more certain we are, the smaller we can specify the prior variance and, as such, the precision of our prior will increase. Such a prior distribution encodes our existing knowledge and is referred to as a subjective or informative prior distribution. If a low precision is specified, such as Prior 2 in Figure ​ Figure1b, 1 b, it is often referred to as a low-informative prior distribution. Note that it is the prior variance of the prior distribution we are considering here and not the variance of the mean of the reading skill score.

One could question how realistic Priors 1 and 2 in Figures ​ Figures1a 1 a and ​ and1b 1 b are, if a reading skills score is the variable of interest. What is a negative reading skills score? And can reading skills result in any positive score? To assess reading skills, a reading test could be used. What if we use a reading test where the minimum possible score is 40 and the maximum possible score is 180? When using such a test, Priors 1 and 2 are not really sensible. In Figure ​ Figure1c, 1 c, a third prior distribution is specified where values outside the range 40–180 are not allowed and within this range obtaining every reading skills score is equally likely.

Perhaps we can include even more information in our prior distribution, with the goal to increase precision and therefore contribute to more accurate estimates. As said before, we assume that our data are obtained from a randomly selected sample from the general population. In that case we might expect a mean of reading skills scores that is close to 100 to be more probable than extremely low or high scores. In Figure ​ Figure1d, 1 d, a prior distribution is displayed that represents this expectation. Figure ​ Figure1e 1 e shows that we can increase the precision of our prior distribution by increasing its prior variance.

In Figure ​ Figure1f, 1 f, a prior distribution is specified where a very low score of reading skills is expected and we are very certain about obtaining such a mean score in our data. This is reflected by a prior distribution with high precision, that is, a small prior variance. If we sample from the general population, such a prior distribution would be highly unlikely to be supported by the data and, in this case, would be a misspecified prior. If, however, we have specified inclusion criteria for our sample, for example, only children with reading skills scores lower than 80 are included because this is the target group, then Prior 6 is correctly specified and Prior 5 would be misspecified.

To summarize, the prior reflects our knowledge about the parameters of our model before observing the current data. If one does not want to specify any prior knowledge, then noninformative priors can be specified and as a result, the final results will not be influenced by the specification of the prior. In the Bayesian literature, this approach to using noninformative priors is referred to as objective Bayesian statistics (Press, 2003 ) because only the data determine the posterior results. Using the objective Bayesian method, one can still benefit from using Bayesian statistics as will be explained throughout the study.

If a low-informative prior is specified, the results are hardly influenced by the specification of the prior, particularly for large samples. The more prior information is added, the more subjective it becomes. Subjective priors are beneficial because: (a) findings from previous research can be incorporated into the analyses and (b) Bayesian credible intervals will be smaller. Both benefits will be discussed more thoroughly in the section where we discuss our posterior results. Note that the term subjective has been a source of controversy between Bayesians and frequentists. We prefer the term informative and argue that the use of any kind of prior be warranted by appealing to empirical evidence. However, for this study, we stay with the term subjective because it is more commonly used in the applied and theoretical literature.

Note that for each and every parameter in the model, a prior distribution needs to be specified. As we have specified a prior distribution for the mean of reading skills scores we also have to specify a prior distribution for the variance/standard deviation of reading skills. This is because for Bayesian statistics, we assume a distribution for each and every parameter including (co)variances. As we might have less prior expectations about the variance of reading skills, we might want to specify a low-informative prior distribution. If we specify the prior for the (residual) variance term in such a way that it can only obtain positive values, the obtained posterior distribution can never have negative values, such as a negative (residual) variance.

Observed Evidence

After specifying the prior distribution for all parameters in the model, one can begin analyzing the actual data. Let us say we have information on the reading skills scores for 20 children. We used the software BIEMS (Mulder, Hoijtink, & de Leeuw, 2012 ) for generating an exact data set where the mean and standard deviation of reading skills scores were manually specified. The second component of Bayesian analysis is the observed evidence for our parameters in the data (i.e., the sample mean and variance of the reading skills scores). This information is summarized by the likelihood function containing the information about the parameters given the data set (i.e., akin to a histogram of possible values). The likelihood is a function reflecting what the most likely values are for the unknown parameters, given the data. Note that the likelihood function is also obtained when non-Bayesian analyses are conducted using ML estimation. In our hypothetical example, the sample mean appears to be 102. So, given the data, a reading skills score of 102 is the most likely value of the population mean; that is, the likelihood function achieves its maximum for this value.

Posterior Distribution

With the prior distribution and current data in hand, these are then combined via Bayes’ theorem to form the so-called posterior distribution . Specifically, in this case, Bayes’ theorem states that our prior knowledge is updated by the current data to yield updated knowledge in the form of the posterior distribution. That is, we can use our prior information in estimating the population mean, variance, and other aspects of the distribution for this sample.

In most cases, obtaining the posterior distribution is done by simulation, using the so-called Markov chain Monte Carlo (MCMC) methods. The general idea of MCMC is that instead of attempting to analytically solve for the point estimates, like with ML estimation, an iterative procedure to estimate the parameters. For a more detailed introduction, see Kruschke ( 2011b , 2013 ), and for a more technical introduction, see Lynch ( 2007 ) or Gelman et al. ( 2004 ). See online Appendix S1 for a brief introduction.

The Posterior Distribution in the Example

The graphs in Figure ​ Figure2 2 demonstrate how the prior information and the information in the data are combined in the posterior distribution. The more information is specified in the prior distribution, the smaller the posterior distribution of reading skills becomes. As long as the prior mean is uninformative (see Figure ​ Figure2a), 2 a), the result obtained for the mean with ML estimation and the posterior mean will always be approximately similar. If an informative prior is specified, the posterior mean is only similar to the ML mean if the prior mean is (relatively) similar to the ML estimate (see Figures ​ Figures2b 2 b to ​ to2e). 2 e). If the prior mean is different from the ML mean (Prior 6), the posterior mean will shift toward the prior (see Figure ​ Figure2 2 f).

An external file that holds a picture, illustration, etc.
Object name is cdev0085-0842-f2.jpg

The likelihood function and posterior distributions for six different specifications of the prior distribution.

The precision of the prior distribution for the reading skills scores influences the posterior distribution. If a noninformative prior is specified, the variance of the posterior distribution is not influenced (see Figure ​ Figure2a). 2 a). The more certain one is about the prior, the smaller the variance, and hence more peaked the posterior will be (cf. Figures ​ Figures2d 2 d and ​ and2 2 e).

Posterior Probability Intervals (PPIs)

Let us now take a closer look at the actual parameter estimates. We analyzed our data set with Mplus, Amos, and WinBUGS. Not all prior specifications are available in each software package; this has been indicated in Table ​ Table2 2 by using subscripts. In Mplus, the default prior distributions for means and regression coefficients are normal distributions with a prior mean of zero and an infinitive large prior variance, that is, low precision (see Figure ​ Figure1b). 1 b). If the prior precision of a specific parameter is set low enough, then the prior in Figure ​ Figure1a 1 a will be approximated. The other prior specifications in Figure ​ Figure1 1 are not available in Mplus. In Amos, however, one can specify a uniform prior, like in Figure ​ Figure1a, 1 a, but also normal distributions, like in Figure ​ Figure1b, 1 b, and a uniform distribution using the boundaries of the underlying scale, like in Figure ​ Figure1c. 1 c. If prior distributions of Figures ​ Figures1d 1 d to ​ to1f 1 f are of interest, one needs to switch to WinBUGS. We assumed no prior information for the variance of reading skills scores and we used the default settings in Amos and Mplus, but in WinBUGS we used a low-informative gamma distribution.

Posterior Results Obtained With Mplus, AMOS, or WINBUGS ( n = 20)

Prior typePrior precision used (prior mean was always 100)Posterior mean reading skills score95% CI/PPI
ML102.0094.42–109.57
Prior 1 101.9994.35–109.62
Prior 2a Large variance, i.e., Var. = 100101.9994.40–109.42
Prior 2b Medium variance, i.e., Var. = 10101.9994.89–109.07
Prior 2 c Small variance, i.e., Var. = 1102.00100.12–103.87
Prior 3 102.0394.22–109.71
Prior 4 Medium variance, i.e., Var. = 10102.0097.76–106.80
Prior 5 Small variance, i.e., Var. = 1102.00100.20–103.90
Prior 6a Large variance, i.e., Var. = 10099.3792.47–106.10
Prior 6b Medium variance, i.e., Var. = 1086.5680.17–92.47

Note . CI = confidence interval; PPI = posterior probability interval; ML = maximum likelihood results; SD = standard deviation; M = posterior mean obtained using Mplus; A = posterior mean obtained using Amos; W = posterior mean obtained using WinBUGS.

In the table, the posterior mean reading skills score and the PPIs are displayed for the six different types of prior specifications for our hypothetical example. Recall that the frequentist confidence interval is based on the assumption of a very large number of repeated samples from the population that are characterized by a fixed and unknown parameter. For any given sample, we can obtain the sample mean and compute, for example, a 95% confidence interval. The correct frequentist interpretation is that 95% of these confidence intervals capture the true parameter under the null hypothesis. Unfortunately, results of the frequentist paradigm are often misunderstood (see Gigerenzer, 2004 ). For example, the frequentist-based 95% confidence interval is often interpreted as meaning that there is a 95% chance that a parameter of interest lies between an upper and lower limit, whereas the correct interpretation is that 95 of 100 replications of exactly the same experiment capture the fixed but unknown parameter, assuming the alternative hypothesis about that parameter is true.

The Bayesian counterpart of the frequentist confidence interval is the PPI, also referred to as the credibility interval. The PPI is the 95% probability that in the population the parameter lies between the two values. Note, however, that the PPI and the confidence interval may numerically be similar and might serve related inferential goals, but they are not mathematical equivalent and conceptually quite different. We argue that the PPI is easier to communicate because it is actually the probability that a certain parameter lies between two numbers, which is not the definition of a classical confidence interval (see also Table ​ Table1 1 ).

Posterior Results of the Example

The posterior results are influenced by the prior specification. The higher the prior precession, the smaller the posterior variance and the more certain one can be about the results. Let us examine this relation using our example.

When the prior in Figure ​ Figure2a 2 a is used, the posterior distribution is hardly influenced by the prior distribution. The estimates for the mean reading skills score obtained from the likelihood function (i.e., the ML results) and posterior result are close to each other (see the first two rows of Table ​ Table2). 2 ). If a normal distribution for the prior is used, as in Figure ​ Figure2b, 2 b, the 95% PPI is only influenced when a high-precision prior is specified; see Table ​ Table2 2 and compare the results of Priors 2a, 2b, and 2c, where only for Prior 2c the resulting PPI is smaller compared to the other priors we discussed so far. This makes sense because for the latter prior we specified a highly informative distribution; that is, the variance of the prior distribution is quite small reflecting strong prior beliefs. If the prior of Figure ​ Figure2c 2 c is used, the results are similar to the ML results. When the prior of Figure ​ Figure2c 2 c is combined with specifying a normal distribution, the PPI decreases again. If we increase the prior precision of the mean even further, for example, for Prior 5, the PPI decreases even more. If the prior mean is misspecified, like in Figure ​ Figure2f, 2 f, the posterior mean will be affected; see the results in Table ​ Table2 2 of Priors 6a and 6b. The difference between Priors 6a and 6b reflects the degree of certainty we have about the prior mean. For Prior 6a we are rather sure the mean was 80, which is reflected by a high prior precision. For Prior 6b we are less sure, and we used a low prior precision. The posterior mean of Prior 6b is therefore closer to the ML estimate when compared to the posterior mean of Prior 6a.

To summarize, the more prior information is added to the model, the smaller the 95% PPIs becomes, which is a nice feature of the Bayesian methodology. That is, after confronting the prior knowledge with the data one can be more certain about the obtained results when compared to frequentist method. This way, science can be truly accumulative. However, when the prior is misspecified, the posterior results are affected because the posterior results are always a compromise between the prior distribution and the likelihood function of the data.

An Empirical Example

To illustrate the Bayesian methods explained in this study, we consider a series of articles that study the theoretical framework of dynamic interactionism where individuals are believed to develop through a dynamic and reciprocal transaction between personality and the environment (e.g., quality of social relationships; Caspi, 1998 ). The main aim of the examined research program was to study the reciprocal associations between personality and relationships over time. In the case of extraversion, for example, an extraverted adolescent might seek out a peer group where extraversion is valued and reinforced, and as such becomes more extraverted.

A theory that explains environmental effects on personality is the social investment theory (Roberts, Wood, & Smith, 2005 ). This theory predicts that the successful fulfillment of societal roles (in work, relationships, health) leads to strengthening of those personality dimensions that are relevant for this fulfillment. For this study, the social investment theory is important because it can be hypothesized that effects fulfilling societal roles on personality are stronger in emerging adulthood when these roles are more central than in earlier phases of adolescence. At the time of the first article in our series (Asendorpf & Wilpers, 1998 ), however, the predictions of social investment theory were not yet published. Instead, the authors started with a theoretical notion by McCrae and Costa ( 1996 ) that personality influences would be more important in predicting social relationships than vice versa. At the time, however, the idea did not yet have much support because:

empirical evidence on the relative strength of personality effects on relationships and vice versa is surprisingly limited. (p. 1532)

Asendorpf and Wilpers ( 1998 ) investigated for the first time personality and relationships over time in a sample of young students ( N = 132) after their transition to university. The main conclusion of their analyses was that personality influenced change in social relationships, but not vice versa. Neyer and Asendorpf ( 2001 ) studied personality–relationship transactions using now a large representative sample of young adults from all over Germany (age between 18 and 30 years; N = 489). Based on the previous results, Neyer and Asendorpf

hypothesized that personality effects would have a clear superiority over relationships effects. (p. 1193)

Consistent with Asendorpf and Wilpers ( 1998 ), Neyer and Asendorpf ( 2001 ) concluded that once initial correlations were controlled, personality traits predicted change in various aspects of social relationships, whereas effects of antecedent relationships on personality were rare and restricted to very specific relationships with one's preschool children (p. 1200). Asendorpf and van Aken ( 2003 ) continued working on studies into personality–relationship transaction, now on 12-year-olds who were followed up until age 17 ( N = 174), and tried to replicate key findings of these earlier studies. Asendorpf and van Aken confirmed previous findings and concluded that the stronger effect was an extraversion effect on perceived support from peers. This result replicates, once more, similar findings in adulthood.

Sturaro, Denissen, van Aken, and Asendorpf ( 2010 ), once again, investigated the personality–relationship transaction model. The main goal of the 2010 study was to replicate the personality–relationship transaction results in an older sample (17–23 years) compared to the study of Asendorpf and van Aken ( 2003 ; 12–17 years). Sturaro et al. found some contradictory results compared to the previously described studies.

[The five-factor theory] predicts significant paths from personality to change in social relationship quality, whereas it does not predict social relationship quality to have an impact on personality change. Contrary to our expectation, however, personality did not predict changes in relationship quality. (p. 8)

In conclusion, the four articles described above clearly illustrate how theory building works in daily practice. By using the quotes from these articles we have seen that researchers do have prior knowledge in their Introduction and Discussion sections. However, all these articles ignored this prior knowledge because they were based on frequentist statistics that test the null hypothesis that parameters are equal to zero. Using Bayesian statistics, we will include prior knowledge in the analysis by specifying a relevant prior distribution.

Description of the Neyer and Asendorpf ( 2001 ) Data

Participants were part of a longitudinal study of young adults. This sample started in 1995 (when participants were 18–30 years old; M age = 24.4 years, SD = 3.7) with 637 participants who were largely representative of the population of adult Germans. The sample was reassessed 4 years later (return rate = 76%). The longitudinal sample included 489 participants ( N = 226 females).

To simplify the models we focus here on only two variables: extraversion as an indicator for personality and closeness with/support by friends as an indicator for relationship quality. Quality of relationships was assessed at both occasions using a social network inventory, where respondents were asked to recall those persons who play an important role in their lives. In the present investigation, we reanalyzed the relevant data on the felt closeness with friends. Participants named on average 4.82 friends ( SD = 4.22) and 5.62 friends ( SD = 4.72) at the first and second occasions, respectively. Closeness was measured with the item: “How close do you feel to this person?” (1 = very distant to 5 = very close ). The ratings were averaged across all friends. Extraversion was assessed using the German version of the NEO-FFI (Borkenau & Ostendorf, 1993 ). Internal consistencies at both measurement occasions were .76 and .78, and the rank order stability was r = .61.

Description of the Sturaro et al. ( 2010 ) and Asendorpf and van Aken ( 2003 ) Data

Participants were part of the Munich Longitudinal Study on the Genesis of Individual Competencies (Weinert & Schneider, 1999 ). This sample started in 1984 (when participants were 3 to 4 years old) with 230 children from the German city of Munich. Participants were selected from a broad range of neighborhoods to ensure representativeness. This study focuses on reassessments of the sample at ages 12, 17, and 23. At age 12, 186 participants were still part of the sample; at age 17 this was true for 174 participants. Because the Asendorpf and van Aken ( 2003 ) publication focused on participants with complete data at both waves of data collection, the present analyses focus on the 174 individuals with data at ages 12 and 17. At age 23, a total of 154 participants were still in the sample. For this study, analyses focus on a subset of 148 individuals who provided personality self-ratings.

Measures selected for this study were taken in correspondence with the cited articles. At age 12, support by friends was measured as support from classroom friends. For the category of classroom friends, an average of 3.0 individuals was listed. For each of these individuals, participants rated the supportiveness of the relationship in terms of instrumental help, intimacy, esteem enhancement, and reliability (three items each; items of the first three scales were adapted from the NRI; Furman & Buhrmester, 1985 ). Ratings were averaged across all friends. At age 17, the same scales were repeated only for the best friend in class. In both years, if participants did not have any classroom friends, they received a score of 1 for support (the lowest possible). At age 23, support was measured using an ego-centered Social Network Questionnaire. Like the Sturaro et al. ( 2010 ) article, we focus here on the average quality with same-sex peers because this measure was deemed most comparable with the peer measures at ages 12 and 17.

Extraversion at ages 12 and 17 was assessed with bipolar adjective pairs (Ostendorf, 1990 ; sample item: unsociable vs. outgoing). At age 23, extraversion was assessed with a scale from the NEO-FFI (Borkenau & Ostendorf, 1993 ; sample item: “I like to have a lot of people around me”). As reported by Sturaro et al. ( 2010 ), in a separate sample of 641 college students, the Ostendorf Scale for Extraversion and the NEO-FFI Scale for Extraversion are correlated almost perfectly after controlling for the unreliability of the scales ( r = .92).

Analytic Strategy

We used Mplus to analyze the model displayed in Figure ​ Figure3. 3 . Two crucial elements when applying Bayesian statistics have to be discussed in the Analytic Strategy section of a Bayesian article: (a) which priors were used and where did these priors came from? And (b) how was convergence assessed (see also online Appendix S1 )? Concerning the latter, we used the Gelman–Rubin criterion (for more information, see Gelman et al., 2004 ) to monitor convergence, which is the default setting of Mplus. However, as recommended by Hox, van de Schoot, and Matthijsse ( 2012 ), we set the cutoff value stricter (i.e., bconvergence = .01) than the default value of .05. We also specified a minimum number of iterations by using biterations = (10,000) , we requested multiple chains of the Gibbs sampler by using chains = 8 , and we requested starting values based on the ML estimates by using stvalues = ml . Moreover, we inspected all the trace plots manually to check whether all chains converged to the same target distribution and whether all iterations used for obtaining the posterior were based on stable chains.

An external file that holds a picture, illustration, etc.
Object name is cdev0085-0842-f3.jpg

Cross-lagged panel model where r 1 is the correlation between Extraversion measured at Wave 1 and Friends measured at Wave 1, and r 2 is the autocorrelation between the residuals of two variables at Wave 2, β 1 and β 2 are the stability paths, and β 3 and β 4 are the cross-loadings. T1 and T2 refer to ages 12 and 17, respectively, for the Asendorpf and van Aken ( 2003 ) data, but to ages 17 and 23, respectively, for the Sturaro, Denissen, van Aken, and Asendorpf ( 2010 ) data.

Concerning the specification of the priors, we developed two scenarios. In the first scenario, we only focus on those data sets with similar age groups. Therefore, we first reanalyze the data of Neyer and Asendorpf ( 2001 ) without using prior knowledge. Thereafter, we reanalyze the data of Sturaro et al. ( 2010 ) using prior information based on the data of Neyer and Asendorpf; both data sets contain young adults between 17 and 30 years of age. In the second scenario, we assume the relation between personality and social relationships is independent of age and we reanalyze the data of Sturaro et al. using prior information taken from Neyer and Asendorpf and from Asendorpf and van Aken ( 2003 ). In this second scenario we make a strong assumption, namely, that the cross-lagged effects for young adolescents are equal to the cross-lagged effects of young adults. This assumption implicates similar developmental trajectories across age groups. We come back to these issues in the Discussion section.

Based on previous research findings, Asendorpf and Wilpers ( 1998 ) hypothesized the model shown in Figure ​ Figure3. 3 . As this study described the first attempt to study these variables over time, Asendorpf and Wilpers would probably have specified (had they used Bayesian statistics) an uninformative prior distribution reflecting no prior knowledge (see also Figures ​ Figures1a 1 a and ​ and1b). 1 b). Neyer and Asendorpf ( 2001 ) gathered a general sample from the German population and analyzed their data. As Neyer and Asendorpf used different test–retest intervals as compared to Asendorpf and Wilpers, we cannot use the results from Asendorpf and Wilpers as prior specifications. So, when reanalyzing Neyer and Asendorpf, we will use the default settings of Mplus, that is, noninformative prior distributions; see Figure ​ Figure1b 1 b and see Model 1 ( Neyer & Asendorpf, 2001 | Uninf. Prior ) in the second column of Table ​ Table3. 3 . Note that “|” means condition on, so the statement is read as the results of the Neyer and Asendorpf ( 2001 ) data condition on an uninformative prior. Sturaro et al. ( 2010 ) continued working on the cross-lagged panel model. In the third column of Table ​ Table3, 3 , the results of Model 2 ( Sturaro et al. , 2010 | Uninf. prior ) are shown when using noninformative prior distribution (which does not take the previous results obtained by Neyer and Asendorpf into account). What if we used our updating procedure and use the information obtained in Model 1 as the starting point for our current analysis? That is, for Model 3 ( Sturaro et al. , 2010 | Neyer & Asendorpf, 2001 ) we used for the regression coefficients the posterior means and standard deviations from Model 1 as prior specifications for Model 3a. Noninformative priors were used for residual variances and for the covariances. This was done because the residuals pick up omitted variables, which almost by definition are unknown. Then, we would have a hard time knowing what their prior relation would be to the outcome or to other variables in model. This way the prior for the subsequent study is a rough approximation to the posterior from the previous study.

Posterior Results for Scenario 1

Model 1: Neyer & Asendorpf ( ) data without prior knowledgeModel 2: Sturaro et al. ( ) data without prior knowledgeModel 3: Sturaro et al. ( ) data with priors based on Model 1
ParametersEstimate ( )95% PPIEstimate ( )95% PPIEstimate ( )95% PPI
β 0.605 (0.037)0.532–0.6760.291 (0.063)0.169–0.4240.333 (0.060)0.228–0.449
β 0.293 (0.047)0.199–0.3860.157 (0.103)−0.042–0.3640.168 (0.092)−0.010–0.352
− – − –
− –
Model fitLower CIUpper CILower CIUpper CILower CIUpper CI
95% CI for difference between observed and replicated chi-square values−14.39816.188−12.59517.263−12.73517.298
ppp value.534.453.473

Note . See Figure 3 for the model being estimated and the interpretation of the parameters. Posterior SD = standard deviation; PPI = posterior probability interval; CI = confidence interval; ppp value = posterior predictive p value.

As pointed out by one of the reviewers, there is an assumption being made that the multiparameter posterior from a previous study can be accurately represented by independent marginal distributions on each parameter. But the posterior distribution captures correlations between parameters, and in regression models the coefficients can be quite strongly correlated (depending on the data). If one would have strong prior believes on the correlations among parameters, this could be represented in a Bayesian hierarchical model. However, because these correlations are data specific in regression, and data and model specific in SEM (see Kaplan & Wenger, 1993 ), it is unlikely that we would be able to elicit such priors. Therefore, the easiest approach is to specify independent marginal priors and let the posterior capture the empirical correlations.

Assuming the cross-lagged panel effects to be not age dependent, Asendorpf and van Aken ( 2003 ) could have used the results from Neyer and Asendorpf ( 2001 ) as the starting point for their own analyses, which in turn could have been the starting point for Sturaro et al. ( 2010 ). In the second column of Table ​ Table4, 4 , the results, without assuming prior knowledge, of Asendorpf and van Aken are displayed, that is, Model 4 ( Asendorpf & van Aken, 2003 | Uninf. prior ). In the third column, that is, Model 5 ( Asendorpf & van Aken, 2003 | Neyer & Asendorpf, 2001 ), the data of Asendorpf and van Aken were updated using prior information taken from Model 1. In the last step, that is, Model 6 ( Sturaro et al. , 2010 | Asendorpf & van Aken, 2003 | Neyer & Asendorpf, 2001 ), the data of Sturaro et al. were updated using the prior information taken from Model 5. In sum, the models tested are as follows:

Posterior Results for Scenario 2

Model 4: Asendorpf & van Aken ( ) data without prior knowledgeModel 5: Asendorpf & van Aken ( ) data with priors based on Model 1Model 6: Sturaro et al. ( ) data with priors based on Model 5
ParametersEstimate ( )95% PPIEstimate ( )95% PPIEstimate ( )95% PPI
β 0.512 (0.069)0.376–0.6490.537 (0.059)0.424–0.6540.314 (0.061)0.197–0.441
β 0.115 (0.083)−0.049–0.2770.139 (0.077)−0.011–0.2880.144 (0.096)−0.039–0.336
− –
− – − –
Model fitLower CIUpper CILower CIUpper CILower CIUpper CI
95% CI for difference between observed and replicated chi-square values−16.25317.102−16.04115.625−12.71216.991
ppp value.515.517.473

Uninformative priors

  • Neyer & Asendorpf, 2001 | Uninf. prior
  • Asendorpf & van Aken, 2003 | Uninf. prior
  • Sturaro et al., 2010 | Uninf. prior
  • Scenario 1: Age specificity when updating knowledge
  • Sturaro et al., 2010 | Neyer & Asendorpf, 2001
  • Scenario 2: Age invariance when updating knowledge
  • Asendorpf & van Aken, 2003 | Neyer & Asendorpf, 2001
  • Sturaro et al., 2010 | Asendorpf & van Aken, 2003 | Neyer & Asendorpf, 2001

When using SEM models to analyze the research questions, one is not interested in a single hypothesis test, but instead in the evaluation of the entire model. Model fit in the Bayesian context relates to assessing the predictive accuracy of a model, and is referred to as posterior predictive checking (Gelman et al., 2004 ). The general idea behind posterior predictive checking is that there should be little, if any, discrepancy between data generated by the model and the actual data itself. Any deviation between the data generated by the model and the actual data suggests possible model misspecification. In essence, posterior predictive checking is a method for assessing the specification quality of the model from the viewpoint of predictive accuracy. A complete discussion of Bayesian model evaluation is beyond the scope of this study; we refer the interested reader to Kaplan and Depaoli ( 2012 , 2013 ).

One approach to quantifying model fit is to compute Bayesian posterior predictive p values ( ppp value). The model test statistic, the chi-square value, is calculated on the basis of the data is compared to the same test statistic, but then defined for the simulated data. Then, the ppp value is defined as the proportion of chi-square values obtained in the simulated data that exceed that of the actual data. The ppp values around .50 indicate a well-fitting model.

Posterior Results

In Table ​ Table3 3 the posterior results are displayed for the first scenario. Consider the posterior regression coefficient for the stability path of Friends (β 2 ), which is estimated as .293 in Model 1; Models 2 and 3 represent different ways of updating this knowledge. Model 2 ignores the results by Neyer and Asendorpf ( 2001 ) and achieves a stability path of .157. Model 3, in contrast, bases the prior distributions on the posterior results of Model 1 (Neyer & Asendorpf, 2001 | Uninf. prior) and arrives at a stability path of .168, which does not differ that much from the original outcome. If we compare the standard deviation of the stability path β 2 of Friends between Model 2 and Model 3 (Sturaro et al., 2010 | Neyer & Asendorpf, 2001 ), we can observe that the latter is more precise (decrease in variance from .103 to .092). Consequently, the 95% PPI changes from [−.042, .364] in Model 2 to [−.010, .352] in Model 3. Thus, the width of the PPI decreased and, after taking the knowledge gained from Model 1 into account, we are more confident about the results of the stability path of Friends.

The cross-lagged effect between Friends T1 → Extraversion T2 (β 4 ) is estimated as −.026 in Model 1 (Neyer & Asendorpf, 2001 | Uninf. prior), but as .303 in Model 2 (Sturaro et al., 2010 | Uninf. prior). When Model 1 is used as input for the prior specification for the Sturaro et al. ( 2010 ) data, Model 3 (Sturaro et al., 2010 | Neyer & Asendorpf, 2001 ), the coefficient is influenced by the prior, and the coefficient becomes .247 again with a smaller PPI. Furthermore, in both Models 2 and 3, the cross-lagged effect between Extraversion T1 → Friends T2 (β 3 ) in Model 3 appears not to be significant.

Concerning Scenario 2 the results of the updating procedure are shown in Table ​ Table4. 4 . Compare Models 4 (Asendorpf & van Aken, 2003 | Uninf. prior) and 5 (Asendorpf & van Aken, 2003 | Neyer & Asendorpf, 2001 ) where the data of Asendorpf and van Aken ( 2003 ) were analyzed with noninformative priors and priors based on Model 1, respectively. Again, in Model 5 the PPIs decreased when compared to Model 4 because of the use of subjective priors. In Model 6 (Sturaro et al., 2010 | Asendorpf & van Aken, 2003 | Neyer & Asendorpf, 2001 ), the data of Sturaro et al. ( 2010 ) were analyzed using priors based on Model 5; consequently, the posterior results of Model 6 are different from the results of Sturaro et al. in Model 2 where no prior knowledge was assumed.

Discussion of Empirical Example

Inspection of the main parameters, the cross-lagged effects, β 3 and β 4 , indicate that there are hardly any differences between Scenarios 1 and 2. Apparently, the results of Sturaro et al. ( 2010 ) are robust irrespective of the specific updating procedure. However, there are differences between the updated outcomes and the original results. That is, Models 3 and 6 have smaller standard deviations and narrower PPIs compared to Model 2. Thus, using prior knowledge in the analyses led to more certainty about the outcomes of the analyses and we can be more confident in the conclusions, namely, that Sturaro et al. found opposite effects to Neyer and Asendorpf ( 2001 ). This should be reassuring for those who might think that Bayesian analysis is too conservative when it comes to revising previous knowledge. Therefore, the bottom line remains that effects occurring between ages 17 and 23 are different from those found when ages 18–30 were used as range. The advantage of using priors is that the confidence intervals became smaller such that the effect of different ages (17–23 vs. 18–30) on the cross-lagged results can be more trusted than before.

Because developmental mechanisms may vary over time, any (reciprocal) effects found between ages 12 and 17 are not necessarily found between ages 17 and 23. Although the Sturaro et al. ( 2010 ) study was originally designed as a replication of the Asendorpf and van Aken ( 2003 ) study, results turned out to be more consistent with the alternative explanation of the social investment theory of Roberts et al. ( 2005 ), namely, that between ages 17 and 23 there might be more change in personality because of significant changes in social roles. In spite of the fact that we have chosen for the exact replication of the Asendorpf and van Aken study (because this was the stated goal of the Sturaro et al., 2010 , study), developmental researchers of course should not blindly assume that previous research findings from different age periods can be used to derive priors. After all, development is often multifaceted and complex and looking only for regularity might make the discovery of interesting discontinuities more difficult. In such cases, however, this sense of indetermination needs to be acknowledged explicitly and translated into prior distributions that are flatter than would be typical in research fields in which time periods are more interchangeable.

One might wonder when it is useful to use Bayesian methods instead of using the default approach. Indeed, there are circumstances in which both methods produce very similar results, but there are also situation that both methods should produce different outcomes. Advantages of Bayesian statistics over frequentist statistics are well documented in the literature (Jaynes, 2003 ; Kaplan & Depaoli, 2012 , 2013 ; Kruschke, 2011a , 2011b ; Lee & Wagenmakers, 2005 ; Van de Schoot, Verhoeven, & Hoijtink, 2012 ; Wagenmakers, 2007 ) and we will just highlight some of those advantages here.

Theoretical Advantages

When the sample size is large and all parameters are normally distributed, the results between ML estimation and Bayesian estimation are not likely to produce numerically different outcomes. However, as we discussed in our study, there are some theoretical differences.

  • The interpretation of the results is very different; for example, see our discussion on confidence intervals. We believe that Bayesian results are more intuitive because the focus of Bayesian estimation is on predictive accuracy rather than “up or down” significance testing. Also, the Bayesian framework eliminates many of the contradictions associated with conventional hypothesis testing (e.g., Van de Schoot et al., 2011 ).
  • The Bayesian framework offers a more direct expression of uncertainty, including complete ignorance. A major difference between frequentist and Bayesian methods is that only the latter can incorporate background knowledge (or lack thereof) into the analyses by means of the prior distribution. In our study we have provided several examples on how priors can be specified and we demonstrated how the priors might influence the results.
  • Updating knowledge : Another important argument for using Bayesian statistics is that it allows updating knowledge instead of testing a null hypothesis over and over again. One important point is that having to specify priors forces one to better reflect on the similarities and differences between previous studies and one's own study, for example, in terms of age groups and retest interval (not only in terms of length but also in terms of developmental processes). Moreover, the Bayesian paradigm sometimes leads to replicating others’ conclusions or even strengthening them (i.e., in our case), but sometimes leads to different or even opposite conclusions. We believe this is what science is all about: updating one's knowledge.

Practical Advantages

In addition to the theoretical advantages, there are also many practical advantages for using Bayesian methods. We will discuss some of them.

  • Eliminating the worry about small sample sizes —albeit with possible sensitivity to priors (as it should be). Lee and Song ( 2004 ) showed in a simulation study that with ML estimation the sample size should be at least 4 or 5 times the number of parameters, but when Bayes was used this ratio decreased to 2 or 3 times the number of parameters. Also, Hox et al. ( 2012 ) showed that in multilevel designs at least 50 clusters are needed on the between level when ML estimation is used, but only 20 for Bayes. In both studies default prior settings were used and the gain in sample size reduction is even larger when subjective priors are specified. It should be noted that the smaller the sample size, the bigger the influence of the prior specification and the more can be gained from specifying subjective priors.
  • When the sample size is small, it is often hard to attain statistical significant or meaningful results (e.g., Button, et al., 2013 ). In a cumulative series of studies where coefficients fall just below significance, then if all results show a trend in the same direction, Bayesian methods would produce a (slowly) increasing confidence regarding the coefficients—more so than frequentist methods.
  • Handling of non-normal parameters : If parameters are not normally distributed, Bayesian methods provide more accurate results as they can deal with asymmetric distributions. An important example is the indirect effect of a mediation analysis, which is a multiplication of two regression coefficients and therefore always skewed. Therefore, the standard errors and the confidence interval computed with the classical Baron and Kenny method or the Sobel test for mediation analyses are always biased (see Zhao, Lynch, & Chen, 2010 , for an in-depth discussion). The same arguments hold for moderation analyses where an interaction variable is computed to represent the moderation effect. Alternatives are bootstrapping, or Bayesian statistics (see Yuan & MacKinnon, 2009 ). The reason that Bayes outperforms frequentist methods is that the Bayesian method does not assume or require normal distributions underlying the parameters of a model.
  • Unlikely results : Using Bayesian statistics it is possible to guard against overinterpreting highly unlikely results. For example, in a study in which one is studying something very unlikely (e.g., extrasensory perception; see the discussion in Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011 ), one can specify the priors accordingly (i.e., coefficient = 0, with high precision). This makes it less likely that a spurious effect is identified. A frequentist study is less specific in this regard. Another example is using small variance priors for cross-loadings in confirmatory factor analyses or in testing for measurement invariance (see Muthén & Asparouhov, 2012 ). The opposite might also be wanted, consider a study in which one is studying something very likely (e.g., intelligence predicting school achievement). The Bayesian method would now be more conservative when it comes to refuting the association.
  • Elimination of inadmissible parameters : With ML estimation it often happens that parameters are estimated with implausible values, for example, negative residual variances or correlations larger than 1. Because of the shape of the prior distribution for variances/covariances, such inadmissible parameters cannot occur. It should be noted, however, that often a negative residual variance is due to overfitting the model and Bayesian estimation does not solve this issue. Bayesian statistics does not provide a “golden solution” to all of one's modeling issues.

In general, we do not want to make the argument for using Bayesian statistics because of its “superiority” but rather one of epistemology. That is, following De Finetti ( 1974a ), we have to come to grips as to what probability is: long-run frequency of a particular result or the uncertainty of our knowledge? This epistemological issue is more fundamental than the divergence of results between the two approaches, which is often less than dramatic.

Limitations and Future Research

Of course, the Bayesian paradigm is not without assumptions and limitations. The most often heard critique is the influence of the prior specification, which might be chosen because of opportune reasons. This could open the door to adjusting results to one's hypotheses by assuming priors consistent with these hypotheses. However, our results might somewhat assuage this critique: The Sturaro et al. ( 2010 ) results were upheld even when incorporating priors that assumed an inverse pattern of results. Nevertheless, it is absolutely necessary for a serious article based on Bayesian analysis to be transparent with regard to which priors were used and why. Reviewers and editors should require this information.

Another discussion among Bayesian statisticians is which prior distribution to use. So far, we only discussed the uniform distribution and the normal distribution. Many more distributions are available as an alternative for the normal distribution, for example, a t distribution with heavier tails to deal with outliers (only available in WinBUGS). It might be difficult for nonstatisticians to choose among all these, sometimes exotic, distributions. The default distributions available in Amos/Mplus are suitable for most models. If an analyst requires a nonstandard or unusual distributions, be aware that most distributions are not (yet) available in Amos/Mplus and it might be necessary to switch to other software, such as WinBUGS or programs available in R. Another critique is that in Bayesian analysis we assume that every parameter has a distribution in the population, even (co)variances. Frequentist statisticians simply do not agree on this assumption. They assume that in the population there is only one true fixed parameter value. This discussion is not the scope of our study and we would like to refer interested readers to the philosophical literature—particularly, Howson and Urbach ( 2006 )–for more information.

A practical disadvantage might be that computational time increases because iterative sampling techniques are used. Fortunately, computer processors are becoming more efficient as well as cheaper to produce. Accordingly, the availability of adequate hardware to run complex models is becoming less of a bottleneck, at least in resource-rich countries. On the other hand, Bayesian analysis is able to handle highly complex models efficiently when frequentist approaches to estimation (i.e., ML) often fail (e.g., McArdle, Grimm, Hamagami, Bowles, & Meredith, 2009 ). This is especially the case for models with categorical data or random effect where Bayes might even be faster than the default numeric integration procedures most often used.

There are a few guidelines one should follow when reporting the analytic strategy and posterior results in a manuscript:

  • If the default settings are used, it is necessary to refer to an article/manual where these defaults are specified.
  • If subjective/informative priors are used, a subsection has to be included in the Analytical Strategy section where the priors are specified and it is should be explicitly stated where they come from. Tables could be used if many different priors are used. If multiple prior specifications are used, as we did in all our examples, include information about the sensitivity analysis.
  • As convergence might be an issue in a Bayesian analysis (see online Appendix S1 ), and because there are not many convergence indices to rely on, information should be added about convergence, for example, by providing (some of) the trace plots as supplementary materials.

In conclusion, we believe that Bayesian statistical methods are uniquely suited to create cumulative knowledge. Because the availability of proprietary and free software is making it increasingly easy to implement Bayesian statistical methods, we encourage developmental researchers to consider applying them in their research.

Supporting Information

Additional supporting information may be found in the online version of this article at the publisher's website:

  • Albert J. Bayesian computation with R. London, UK: Springer; 2009. [ Google Scholar ]
  • Arbuckle JL. Amos 7.0 user's guide. Chicago, IL: SPSS; 2006. [ Google Scholar ]
  • Asendorpf JB, Conner M, De Fruyt F, De Houwer J, Denissen JJA, Fiedler K, Wicherts JM. Recommendations for increasing replicability in psychology. European Journal of Personality. 2013; 27 :108–119. doi: 10.1002/per.1919 . [ Google Scholar ]
  • Asendorpf JB, van Aken MAG. Personality-relationship transaction in adolescence: Core versus surface personality characteristics. Journal of Personality. 2003; 71 :629–666. doi: 10.1111/1467-6494.7104005 . [ PubMed ] [ Google Scholar ]
  • Asendorpf JB, Wilpers S. Personality effects on social relationships. Journal of Personality and Social Psychology. 1998; 74 :1531–1544. doi: 10.1037/0022-3514.74.6.1531 . [ Google Scholar ]
  • Bayes T, Price R. An essay towards solving a problem in the doctrine of chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S. Philosophical Transactions of the Royal Society of London. 1763; 53 :370–418. doi: 10.1098/rstl.1763.0053 . [ Google Scholar ]
  • Borkenau P, Ostendorf F. NEO-Fünf-Faktoren-Inventar nach Costa und McCrae. Göttingen, Germany: Hogrefe; 1993. [NEO-Five-Factor-Questionnaire as in Costa and McCrae] [ Google Scholar ]
  • Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafaò MR. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience. 2013; 14 :365–376. [ PubMed ] [ Google Scholar ]
  • Caspi A. Personality development across the life course. In: Eisenberg N, editor. Handbook of child psychology: Vol. 3. Social, emotional, and personality development. 5th ed. New York, NY: Wiley; 1998. pp. 311–388. [ Google Scholar ]
  • De Finetti B. Bayesianism: Its unifying role for both the foundations and applications of statistics. International Statistical Review. 1974a; 42 :117–130. [ Google Scholar ]
  • De Finetti B. Theory of probability. New York, NY: Wiley; 1974b. (Vols. 1 and 2) [ Google Scholar ]
  • Furman W, Buhrmester D. Children's perceptions of the personal relationships in their social networks. Developmental Psychology. 1985; 21 :1016–1024. doi: 10.1037/0012-1649.21.6.1016 . [ Google Scholar ]
  • Geiser C. Data analysis with Mplus. New York, NY: The Guilford Press; 2013. [ Google Scholar ]
  • Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. 2nd ed. London, UK: Chapman & Hall; 2004. [ Google Scholar ]
  • Gigerenzer G. The irrationality paradox. Behavioral and Brain Sciences. 2004; 27 :336–338. doi: 10.1017/S0140525X04310083 . [ Google Scholar ]
  • Howson C, Urbach P. Scientific reasoning: The Bayesian approach. 3rd ed. Chicago, IL: Open Court; 2006. [ Google Scholar ]
  • Hox J, van de Schoot R, Matthijsse S. How few countries will do? Comparative survey analysis from a Bayesian perspective. Survey Research Methods. 2012; 6 :87–93. [ Google Scholar ]
  • Jaynes ET. Probability theory: The logic of science. Cambridge, UK: Cambridge University Press; 2003. [ Google Scholar ]
  • Kaplan D, Depaoli S. Bayesian structural equation modeling. In: Hoyle R, editor. Handbook of structural equation modeling. New York, NY: Guilford Press; 2012. pp. 650–673. [ Google Scholar ]
  • Kaplan D. Bayesian statistical methods. In: Little TD, Depaoli S, editors. Oxford handbook of quantitative methods. Oxford, UK: Oxford University Press; 2013. pp. 407–437. [ Google Scholar ]
  • Kaplan D, Wenger RN. Asymptotic independence and separability in covariance structure models: Implications for specification error, power, and model modification. Multivariate Behavioral Research. 1993; 28 :483–498. doi: 10.1207/s15327906mbr2804_4 . [ PubMed ] [ Google Scholar ]
  • Kruschke JK. Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science. 2011a; 6 :299–312. doi: 10.1177/1745691611406925 . [ PubMed ] [ Google Scholar ]
  • Kruschke JK. Doing Bayesian data analysis. Burlingon, MA: Academic Press; 2011b. [ Google Scholar ]
  • Kruschke JK. Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. 2013; 142 :573–603. doi: 10.1037/a0029146 . [ PubMed ] [ Google Scholar ]
  • Lee MD, Wagenmakers EJ. Bayesian statistical inference in psychology: Comment on Trafimow (2003) Psychological Review. 2005; 112 :662–668. doi: 10.1037/0033-295X.112.3.662 . [ PubMed ] [ Google Scholar ]
  • Lee S-Y, Song X-Y. Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research. 2004; 39 :653–686. [ PubMed ] [ Google Scholar ]
  • Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing. 2000; 10 :325–337. doi: 10.1007/s11222-008-9100-0 . [ Google Scholar ]
  • Lynch S. Introduction to applied Bayesian statistics and estimation for social scientists. New York, NY: Springer; 2007. [ Google Scholar ]
  • McArdle JJ, Grimm KJ, Hamagami F, Bowles RP, Meredith W. Modeling life-span growth curves of cognition using longitudinal data with multiple samples and changing scales of measurement. Psychological Methods. 2009; 14 :126–149. doi: 10.1037/a0015857 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McCrae RR, Costa PT. Towards a new generation of personality theories: Theoretical contexts for the five-factor model. In: Wiggins JS, editor. The five-factor model of personality: Theoretical perspectives. New York, NY: Guilford Press; 1996. pp. 51–87. [ Google Scholar ]
  • Meeus W, Van de Schoot R, Keijsers L, Schwartz SJ, Branje S. On the progression and stability of adolescent identity formation. A five-wave longitudinal study in early-to-middle and middle-to-late adolescence. Child Development. 2010; 81 :1565–1581. doi: 10.1111/j.1467-8624.2010.01492.x . [ PubMed ] [ Google Scholar ]
  • Mulder J, Hoijtink H, de Leeuw C. Biems: A Fortran90 program for calculating Bayes factors for inequality and equality constrained models. Journal of Statistical Software. 2012; 46 :2. [ Google Scholar ]
  • Muthén B, Asparouhov T. Bayesian SEM: A more flexible representation of substantive theory. Psychological Methods. 2012; 17 :313–335. doi: 10.1037/a0026802 . [ PubMed ] [ Google Scholar ]
  • Muthén LK, Muthén BO. Mplus user's guide. 7th ed. Los Angeles, CA: Muthén & Muthén; 1998. –2012) [ Google Scholar ]
  • Neyer FJ, Asendorpf JB. Personality-relationship transaction in young adulthood. Journal of Personality and Social Psychology. 2001; 81 :1190–1204. doi: 10.1037//0022-3514.81.6.1190 . [ PubMed ] [ Google Scholar ]
  • Ntzoufras I. Bayesian modeling using WinBUGS. Hoboken, NJ: Wiley; 2011. [ Google Scholar ]
  • O'Hagan A, Buck CE, Daneshkhah A, Eiser JR, Garthwaite PH, Jenkinson DJ, Rakow T. Uncertain judgements: Eliciting experts’ probabilities. West Sussex, UK: Wiley; 2006. [ Google Scholar ]
  • Ostendorf F. Sprache und Persönlichkeitsstruktur. Zur Validität des Fünf-Faktoren Modells der Persönlichkeit. Regensburg, Germany: Roderer; 1990. [Language and personality structure: Validity of the five-factor model of personality] [ Google Scholar ]
  • Press SJ. Subjective and objective Bayesian statistics: Principles, models, and applications. 2nd ed. New York, NY: Wiley; 2003. [ Google Scholar ]
  • Rietbergen C, Klugkist I, Janssen KJM, Moons KGM, Hoijtink H. Incorporation of historical data in the analysis of randomized therapeutic trials. Contemporary Clinical Trials. 2011; 32 :848–855. doi: 10.1016/j.cct.2011.06.002 . [ PubMed ] [ Google Scholar ]
  • Roberts BW, Wood D, Smith JL. Evaluating five factor theory and social investment perspectives on personality trait development. Journal of Research in Personality. 2005; 39 :166–184. doi: 10.1016/j.jrp.2004.08.002 . [ Google Scholar ]
  • Rowe ML, Raudenbush SW, Goldin-Meadow S. The pace of vocabulary growth helps predict later vocabulary skill. Child Development. 2012; 83 :508–525. doi: 10.1111/j.1467-8624.2011.01710.x . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Stigler SM. Laplace's 1774 memoir on inverse probability. Statistical Science. 1986; 1 :359–363. [ Google Scholar ]
  • Sturaro C, Denissen JJA, van Aken MAG, Asendorpf JB. Person-environment transactions during emerging adulthood the interplay between personality characteristics and social relationships. European Psychologist. 2010; 13 :1–11. doi: 10.1027/1016-9040.13.1.1 . [ Google Scholar ]
  • Van de Schoot R, Hoijtink H, Mulder J, Van Aken MAG, Orobio de Castro B, Meeus W, Romeijn J-W. Evaluating expectations about negative emotional states of aggressive boys using Bayesian model selection. Developmental Psychology. 2011; 47 :203–212. doi: 10.1037/a0020957 . [ PubMed ] [ Google Scholar ]
  • Van de Schoot R, Verhoeven M, Hoijtink H. Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology. 2012; 10 :81–98. doi: 10.1080/17405629.2012.732719 . [ Google Scholar ]
  • Van Wesel F. Utrecht University; 2011. Prom./coprom.: prof. dr. H.J.A. Hoijtink, dr. I.G. Klugkist & dr. H.R. Boeije, July 1). Priors & prejudice: using existing knowledge in social science research . [ Google Scholar ]
  • Wagenmakers EJ. A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review. 2007; 14 :779–804. doi: 10.3758/BF03194105 . [ PubMed ] [ Google Scholar ]
  • Wagenmakers EJ, Wetzels R, Borsboom D, van der Maas HLJ. Why psychologists must change the way they analyze their data: The case of psi. Journal of Personality and Social Psychology. 2011; 100 :426–432. doi: 10.1037/a0022790 . [ PubMed ] [ Google Scholar ]
  • Walker LJ, Gustafson P, Frimer JA. The application of Bayesian analysis to issues in developmental research. International Journal of Behavioral Development. 2007; 31 :366–373. doi: 10.1177/0165025407077763 . [ Google Scholar ]
  • Weinert FE, Schneider W. Individual development from 3 to 12: Findings from the Munich Longitudinal Study. New York, NY: Cambridge University Press; 1999. [ Google Scholar ]
  • Yuan Y, MacKinnon DP. Bayesian mediation analysis. Psychological Methods. 2009; 14 :301–322. doi: 10.1037/a0016972 . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Zhang Z, Hamagami F, Wang L, Grimm KJ, Nesselroade JR. Bayesian analysis of longitudinal data using growth curve models. International Journal of Behavioral Development. 2007; 31 :374–383. doi: 10.1177/0165025407077764 . [ Google Scholar ]
  • Zhao X, Lynch JG, Jr, Chen Q. Reconsidering Baron and Kenny: Myths and truths about mediation analysis. Journal of Consumer Research. 2010; 37 :197–206. doi: 10.1086/651257 . [ Google Scholar ]

Loading metrics

Open Access

Peer-reviewed

Finemap-MiXeR: A variational Bayesian approach for genetic finemapping

Roles Conceptualization, Data curation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Centre for Precision Psychiatry, Institute of Clinical Medicine, University of Oslo, Oslo, Norway, Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway

ORCID logo

Roles Conceptualization, Supervision, Writing – original draft, Writing – review & editing

Roles Conceptualization, Writing – review & editing

Affiliation Centre for Precision Psychiatry, Institute of Clinical Medicine, University of Oslo, Oslo, Norway

Roles Conceptualization, Methodology, Writing – review & editing

Affiliation Constructor University Bremen, Bremen, Germany

Affiliations Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway

Affiliation Center for Multimodal Imaging and Genetics, University of California San Diego, California, United States of America

  • Bayram Cevdet Akdeniz, 
  • Oleksandr Frei, 
  • Alexey Shadrin, 
  • Dmitry Vetrov, 
  • Dmitry Kropotov, 
  • Eivind Hovig, 
  • Ole A. Andreassen, 
  • Anders M. Dale

PLOS

  • Published: August 15, 2024
  • https://doi.org/10.1371/journal.pgen.1011372
  • Reader Comments

This is an uncorrected proof.

Table 1

Genome-wide association studies (GWAS) implicate broad genomic loci containing clusters of highly correlated genetic variants. Finemapping techniques can select and prioritize variants within each GWAS locus which are more likely to have a functional influence on the trait. Here, we present a novel method, Finemap-MiXeR, for finemapping causal variants from GWAS summary statistics, controlling for correlation among variants due to linkage disequilibrium. Our method is based on a variational Bayesian approach and direct optimization of the Evidence Lower Bound (ELBO) of the likelihood function derived from the MiXeR model. After obtaining the analytical expression for ELBO’s gradient, we apply Adaptive Moment Estimation (ADAM) algorithm for optimization, allowing us to obtain the posterior causal probability of each variant. Using these posterior causal probabilities, we validated Finemap-MiXeR across a wide range of scenarios using both synthetic data, and real data on height from the UK Biobank. Comparison of Finemap-MiXeR with two existing methods, FINEMAP and SuSiE RSS, demonstrated similar or improved accuracy. Furthermore, our method is computationally efficient in several aspects. For example, unlike many other methods in the literature, its computational complexity does not increase with the number of true causal variants in a locus and it does not require any matrix inversion operation. The mathematical framework of Finemap-MiXeR is flexible and may also be applied to other problems including cross-trait and cross-ancestry finemapping.

Author summary

Genome-Wide Association Studies report the effect size of each genomic variant as summary statistics. Due to the correlated structure of the genomic variants, it may not be straightforward to determine the actual causal genomic variants from these summary statistics. Finemapping studies aim to identify these causal SNPs using different approaches. Here, we presented a novel finemapping method, called Finemap-MiXeR, to determine the actual causal variants using summary statistics data and weighted linkage disequilibrium matrix as input. Our method is based on Variational Bayesian inference on MiXeR model and Evidence Lower Bound of the model is determined to obtain a tractable optimization function. Afterwards, we determined the first derivatives of this Evidence Lower Bound, and finally, Adaptive Moment Estimation is applied to perform optimization. Our method has been validated on synthetic and real data, and similar or better performance than the existing finemapping tools has been observed.

Citation: Akdeniz BC, Frei O, Shadrin A, Vetrov D, Kropotov D, Hovig E, et al. (2024) Finemap-MiXeR: A variational Bayesian approach for genetic finemapping. PLoS Genet 20(8): e1011372. https://doi.org/10.1371/journal.pgen.1011372

Editor: Heather J. Cordell, Newcastle University, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND

Received: January 25, 2024; Accepted: July 17, 2024; Published: August 15, 2024

Copyright: © 2024 Akdeniz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Finemap-MiXeR method is available as a MATLAB package and can also be used with Singularity or Docker containers without using MATLAB. The source code and tool can be obtained from GitHub ( https://github.com/bayramakdeniz/Finemap-MiXeR ) alongside with user’s tutorial and example data. The datasets analyzed during the current study are available for download from the following URLs: https://portals.broadinstitute.org/collaboration/giant/images/c/c8/Meta-analysis_Locke_et_al%2BUKBiobank_2018_UPDATED.txt.gz ; UK Biobank accessed via application 27412, https://ams.ukbiobank.ac.uk/ams/ (upon application); 1000 Genomes Phase3 data, http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ ; the synthetic data is accessible via: https://github.com/comorment/containers/tree/main/reference/hapgen . The summary statistics for AD is accessible via: https://vu.data.surfsara.nl/index.php/s/LGjeIk6phQ6zw8I/download?path=%2F&files=Wightmanetal2023_NBA_GenomicSEMCommonFactor_SummaryStatistics.txt.gz The summary statistics for PD can be accessible via: https://www.sciencedirect.com/science/article/pii/S1474442219303205?via%3Dihub#cesec120 .

Funding: The authors were funded the South-Eastern Norway Regional Health Authority (#2022073 to B.C.A. and O.F.), Research Council of Norway (#324499 to O.F. and #326813 to A.S.), Norway grant (#EEA-RO-NO-2018-0573 to A.S.) and NordForsk to the NeIC Heilsa “Tryggvedottir” (#101021 to B.C.A. and E.H). This research has been conducted using the UK Biobank Resource under Application Number 27412. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Dr. AMD is a Founder of and holds equity in CorTechs Labs, Inc, and serves on its Scientific Advisory Board. He is a member of the Scientific Advisory Board of Human Longevity, Inc. and receives funding through research agreements with General Electric Healthcare and Medtronic, Inc. The terms of these arrangements have been reviewed and approved by UCSD in accordance with its conflict-of-interest policies. Dr. OAA is a consultant for cortechs.ai, and received speaker’s honorarium from Janssen, Lundbeck and Sunovion unrelated to the topic of this study. The remaining authors have no competing interest.

Introduction

Genome-wide association studies (GWAS) have discovered hundreds of genomic loci associated with complex human traits and disorders [ 1 ]. GWAS test for association between genomic variants called single nucleotide polymorphisms (SNPs) and the corresponding traits of interest. The results of a GWAS are available as summary statistics including association effect size, standard error, and statistical significance (z-scores or p-values) for each SNP. While many SNPs may show a significant association, most of them are likely to be driven by linkage disequilibrium (LD), i.e. through correlation with a neighboring causal variant rather than through having a direct functional influence on the trait [ 2 ]. Causal SNPs may also be missed in GWAS due to insufficient statistical power, unmeasured or unimputed SNPs [ 3 ]. Statistical finemapping methods aim to identify causal SNPs within a given locus after controlling for LD.

research paper on bayesian statistics

Benner et . al . developed a computationally efficient method called FINEMAP [ 11 ] that calculates the likelihood function using Cholesky Decomposition, and then searches possible causal configurations via the Shotgun Stochastic Search [ 12 ]. Thanks to these improvements, the computational complexity has been reduced, while preserving the same accuracy as the previous Bayesian exhaustive search methods like CAVIARBF. An extension of the FINEMAP method [ 13 ] can also estimate the effect sizes of causal variants, and heritability attributed to the locus being analysed.

Another recent approach to finemapping is based on applying a modified version of Single Effect Regression model [ 7 ], called the Sum of Single effects (SuSiE) [ 14 ]. The main idea behind this method is optimizing the proposed model by eliminating the effect of each causal SNP using Iterative Bayesian stepwise selection (IBSS). Compared to the other Bayesian variable selection methods [ 15 , 16 ], SuSiE has lower computational complexity, and is more suitable for inference on highly correlated variables. It was demonstrated that SuSiE had better accuracy than the previously published finemapping methods [ 14 ]. While the original SuSiE algorithm requires individual-level genotype and phenotype data as input, it has been expanded to SuSiE Regression Summary Statistics (RSS) method which only requires summary statistics-level data [ 17 ]. SuSiE-RSS yields a similar accuracy as the original SuSiE algorithm, and at the same time reduces the computational complexity.

Despite the effectiveness of currently available finemapping methods, they can still be improved in terms of both accuracy and computational aspects. Here, we present a novel Finemap-MiXeR method, based on variational Bayesian approach leveraging the MiXeR model [ 2 ]. The MiXeR model assumes a biologically plausible prior distribution of SNPs and can estimate the heritability, polygenicity and discoverability of a given trait, and the polygenic overlap between two traits [ 18 ]. In Finemap-MiXeR, following variational Bayesian approach, the likelihood function of observing GWAS z-scores is replaced with its Evidence Lower Bound (ELBO). We analytically determined the derivatives of the ELBO function and optimized it with the Adaptive Moment Estimation (ADAM) algorithm [ 19 ]. This method requires summary statistics and scaled LD-matrix, and outputs the posteriors of SNPs being causal namely posterior causal probabilities .

Our proposed finemapping method has several advantages over existing tools. First, we show increased accuracy of Finemap-MiXeR over FINEMAP and SuSiE in detecting causal SNPs in simulation, across a broad range of scenarios. Furthermore, despite the increase in performance is limited compared to other methods, our method can also detect different causal variants that other methods did not identify in some scenarios. We also validated our method in height, Alzheimer’s disease (ALZ) and Parkinson’s disease (PD) applications. The computational complexity of Finemap-MiXeR is only increasing with respect to the number of SNPs per locus (M) and unlike other methods it is not increasing as the number of causals (k) or locus’s heritability ( h 2 ) increases. Furthermore, unlike many existing finemapping methods, Finemap-MiXeR does not require to compute the inverse of the LD matrix, which is an important aspect and is broadly considered in various studies, such as [ 20 , 21 ]. Finally, the flexibility of our mathematical framework provides possibilities to extend the current approach in various directions, such as finemapping in multiple traits or across ancestries (For details see Discussion section). Taken together these advantages of Finemap-MiXeR represent an important step forward in our ability to disentangle biological insights from the associations observed in GWAS.

Description of the method

Ethics statement.

The UK Biobank was granted ethical approval by the North West Multi-centre Research Ethics Committee (MREC) to collect and distribute data and samples from the participants ( http://www.ukbiobank.ac.uk/ethics/ ) and covers the work in this study, which was performed under UK Biobank application numbers 27412. All participants included in these analyses gave written consent to participate.

Variational bayesian inference on the mixer model

research paper on bayesian statistics

Using this model and parametric family, we can optimize ℒ( q , θ ) and obtain the parameters of the q ( β , u ) which corresponds to the posterior causal probability of each SNP (q i ), and parameter ( μ i ) indicating corresponding effect size. Note that we use the same parametric family q ( β , u ) as proposed in [ 23 ], that applied variational Bayesian approach for Polygenic Risk Score (PRS) analysis. Our method is different in that we proceed with derive derivatives of the ELBO function using its derivatives, as an alternative to variational EM algorithm used in [ 23 ]. Also, our application specifically focused on accuracy of finemapping causal variants and developed accordingly, rather than genome-wide polygenic risk prediction.

Derivation of derivatives of ELBO function

In order to perform the optimization of ℒ( q , θ ), we will be using the Adaptive moment estimation (ADAM) algorithm, which computes the adaptive learning rate for each parameter using the first derivatives of ℒ( q , θ ). Therefore, we need to calculate the corresponding derivatives with respect to μ i , σ i and q i analytically.

research paper on bayesian statistics

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

T A is a function of a ij and z j . For details see S1 Notes .

https://doi.org/10.1371/journal.pgen.1011372.t001

Hyperparameters

research paper on bayesian statistics

Credible sets

research paper on bayesian statistics

Step 1. Sort obtained posterior causal probabilities ( q i ) in descending order. Let Q be the list of these sorted variants.

Step 2. Assign L candidate credible sets by choosing L variants that have highest q i and their pairwise absolute correlation is lower than η

Step 3. Add more variants to these sets from the list of Q whose pairwise absolute correlation is higher than η . Remove these added variant from the list of Q.

Step 4. Repeat Step 3 for each set until it satisfies P k > q thr

Step 5. Discard sets who do not satisfy P k > q thr, and report the resulting L * credible sets where L *≤ L .

Following this procedure, we can report multiple credible sets that include variants whose absolute correlation is greater than η and satisfy P k > q thr . The choice of the initial number of credible sets (L) is not required to be determined by the user. Since hyperparameters can be optimized during the finemapping procedure, we can obtain an optimized π 1 , which implies that the number of causal variants would be M π 1 . Therefore, having a higher number of L than this number (in our simulations, we chose L = ⌈ M π 1 ⌉, where ⌈.⌉ is the ceiling operator) will be sufficient to be able to capture all possible credible sets. Furthermore, L could also be chosen as any number bigger than M π 1 , and it is observed that the results are not sensitive to the choice of L for the 0.95 credible set threshold, given that it is bigger than the number of causals (See Fig E in S1 Text for details). This is expected, since having a bigger L may initially construct a larger number of credible sets. However, eventually the redundant credible sets would be eliminated at step 5. Having a lower number of L than M π 1 may however lead to missing some possible credible sets.

The choice of q thr and η also affect the number of possible credible sets. Therefore, if q thr is chosen as lower than the conventional threshold (0.95), we may expect a higher number of credible sets (in such cases, L can be internally and automatically adjusted to a higher number (higher than L = ⌈ M π 1 ⌉), depending on the chosen q thr ). Similarly, if η is chosen too low, it may lead to encompassing two true causal variants in the same credible set if their absolute correlation is greater than η . This may result in having a credible set with two (or more) true causals, thus reducing the number of credible sets.

Computational complexity of Finemap-MiXeR

research paper on bayesian statistics

Reducing computational complexity with Finemap-MiXeR-PCA

research paper on bayesian statistics

Verification and comparison

We compared Finemap-MiXeR method with FINEMAP (FINEMAP 1.4) and SuSiE RSS in terms of their accuracy and runtime performance. Using synthetic data with known location of causal variants the accuracy of the methods was measured using the area under Receiver Operating Characteristic (ROC). When comparing the methods using real data on height from UK Biobank (UKB), the true location of causal variants are unknown, and we therefore used a proxy measure of finemapping accuracy, evaluating how well we could predict the phenotype from SNPs selected as causal by each of the methods. For the runtime performance we also compared Finemap-MiXeR-PCA and SuSiE in the “Runtime Performance and Computational Complexity” section below, but omitted their performances on accuracy, since their performances in terms of accuracy were almost identical with Finemap-MiXeR and SuSiE RSS respectively (See Fig A in S1 Text ). Note that for all experiments, exactly same data is used for all methods for the sake of fair comparison. We also applied our Finemap-MiXeR method to Alzheimer’s disease and Parkinson’s disease. Note that, for synthetic data, we have assigned the causal variants and simulate the phenotype accordingly, while for height, Alzheimer’s disease (ALZ) and Parkinson’s disease (PD), we have chosen loci with at least one SNP strongly associated with a trait, and then applied these loci to finemapping as defined elsewhere [ 4 ].

Simulation with synthetic data

research paper on bayesian statistics

Using the procedure described above and sketched in Fig 1 , we randomly chose a locus from this synthetic genome data and obtained the corresponding G matrix and then determined the artificial phenotype vector y for different values of M and h 2 for N = 10 000. This procedure is repeated 50 times in each scenario particularly for different numbers of causals. We have also repeated same simulation procedure by randomly pre-assigning the causal SNPs with a vector β with normal distribution where β i = N(0,1) if the SNP is causal and 0 otherwise.

thumbnail

Firstly, we randomly selected a locus containing pre-defined number (M) of adjacent SNPs, randomly selected “k” causal variants within the loci, and draw their effect sizes (vector β ). Then, we used synthetic genotype data (G) with realistic LD structure, as generated by hapgen2 tool, to calculate the phenotypic values (y) for all individuals using additive genetic model ( y = c 1 Gβ + c 2 ϵ ), where scaling constants c 1 and c 2 were chosen to yield Var ( y ) = 1 and Var ( c 1 Gβ ) = h 2 (pre-defined value indicating true heritability of the loci). Using G and y we calculated z-scores by applying GWAS, and then used them as inputs for the tools to obtain posterior causal probabilities of each SNP. Since we know ground truth (the location of causal variants), we then determined Receiver Operating Characteristic (ROC) curves for Finemap-MiXeR and the comparison methods (SuSiE RSS and FINEMAP 1.4) and calculated corresponding Area Under the Curve (AUC).

https://doi.org/10.1371/journal.pgen.1011372.g001

Using posterior causal probabilities of each SNP to be causal (q i ), we evaluated the power of detecting the actual causal variants and obtained the corresponding Receiver Operating Characteristic (ROC) curve for three methods, and finally calculated area under these curves (AUC) for comparison. The AUC values of these methods are presented in Fig 2A (simulations where β i = 1 for causal SNPs) and Fig 2B (simulations where β i = N(0,1) for causal SNPs). Note that these values in the figures are the averaged values of 50 different experiments. As can be seen in Fig 2 , Finemap-MiXeR either outperforms the other methods in different scenarios especially for lower heritability/higher polygenicity or has similar performance as other methods. The mean of the AUC values of all those experiments in Fig 2A (mean of the 5x3x3 = 45 different configurations presented in Fig 2A ) for Finemap-MiXeR, SuSiE RSS and FINEMAP are 0.870, 0.851 and 0.856, respectively. These values for Fig 2B are 0.819, 0.802 and 0.808, respectively. The corresponding Area Under Precision Recall Curves (AUPRC) are also presented in Figs C and D in S1 Text .

thumbnail

(A) Area Under the ROC Curve (AUC) comparison of Finamep-MiXeR with SuSiE RSS and FINEMAP across different scenarios, varying: the size of the locus being analyzed (M = 200, 1000, 2000, 4000, or 8000 SNPs per locus, shown in rows); the true number of causal variants (k = 1, 5, or 10, shown in columns), and the true heritability within the locus (h2 = 0.001, 0.005, or 0.01, shown on the horizontal axis of each panel). Effect size of causal SNPs are randomly assigned as β i = 1 and then adjusted based on given heritability. The curves represent Receiver Operating Characteristic (ROC) curve averaged across 50 different simulations with corresponding standard error. (B) Area Under the ROC Curve (AUC) comparison of Finamep-MiXeR with SuSiE RSS and FINEMAP across different scenarios, varying: the size of the locus being analyzed (M = 200, 1000, 2000, 4000, or 8000 SNPs per locus, shown in rows); the true number of causal variants (k = 1, 5, or 10, shown in columns), and the true heritability within the locus (h2 = 0.001, 0.005, or 0.01, shown on the horizontal axis of each panel). Effect size of causal SNPs are randomly assigned by β i = N(0,1) and then adjusted based on given heritability. The curves represent Receiver Operating Characteristic (ROC) curve averaged across 50 different simulations with corresponding standard error.

https://doi.org/10.1371/journal.pgen.1011372.g002

The performance of the variation of our method (Finemap-MiXeR-PCA) was plotted in Fig A in S1 Text . As can be observed from this figure, its performance is quite similar to Finemap-MiXeR’s. Furthermore, in this figure, we also compared the performance of our method when the hyperparameters are known and given by the user or when these hyperparameters are optimized using the corresponding derivatives. As given in this figure, the performance of our methods (Finemap-MiXeR and Finemap-MiXeR-PCA) is almost same if the hyperparameters are also optimized within the algorithm.

We also plotted one to one comparison of the posterior causal probabilities obtained from Finemap-MiXeR and SuSiE RSS using scatter plots presented in Fig B in S1 Text . As seen from this figure, SuSiE RSS and Finemap-MiXeR may have different posterior causal probabilities for several SNPs. On the other hand, as can be seen the histogram of causal and non-causal SNPs in Fig B in S1 Text , their distributions are similar. More importantly, Finemap-MiXeR was able to detect some causal SNPs that are not detected by SuSiE RSS (or other methods which provides similar posteriors as SuSiE RSS). Given the fact that Finemap-MiXeR, SuSiE RSS and FINEMAP have similar accuracy in terms of detecting causal SNPs, it is valuable to identify causal SNPs that may not been detected by other methods. In particular, in these experiments, 7.2 percent of causal SNPs is only detected by Finemap-MiXeR (and not by SuSiE RSS) while 4.2 percent is only detected by SuSiE RSS (and not by Finemap-MiXeR). These numbers for SuSiE RSS and FINEMAP are relatively low, 1.2 and 1.1 percent respectively. Therefore, having such diversity in posterior causal probabilities might suggest using Finemap-MiXeR and SuSiE RSS (or other methods) together to detect more possible causal SNPs.

We have also examined the performance of credible sets and compared it with SuSiE RSS’s credible sets in different metrics. One metric is coverage which is the probability of a credible set includes at least one causal SNP. Other metric is power which is the total proportion of causal SNPs detected by all reported credible sets. Using the similar simulation procedure described in this section, we have examined these metrics in different regimes that are changing by heritability and polygenicity. As can be seen in Fig 3 , SuSiE RSS has slightly better coverage results than Finemap-MiXeR in some scenarios. On the other hand, Finemap-MiXeR mostly detect more causal SNPs and thus have higher power values compared to SuSiE RSS. Furthermore, one can observe from Fig 3 that as the heritability (h 2 ) decreases and/or the number of causal (k) increases, the power and the coverage performance of both methods decrease. This is an expected behavior and can also be observed from the AUC performance in Fig 2 . When heritability is lower or the number of causals is higher, the signal per causal variant is reduced, and this makes it harder to detect causal variants. We can illustrate this by the scenarios in the second row of Fig 3 (h 2 = 0.01): When k = 10, the power of the Finemap-MiXeR and SuSiE are 0.21 and 0.17, respectively, and corresponding coverage are 0.50 and 0.51, respectively. This implies that each method was able to detect around 20% of the causal variants in credible sets (which is equivalent to detecting 2 causal variants out of k = 10) with a coverage 0.5. On the other hand, as can be seen in the fourth row of the Fig 3 , when the heritability is higher (h 2 = 0.04), the power for k = 10 increases to 0.53 and 0.51 and coverage increases to 0.76 and 0.79, respectively.

thumbnail

The effect sizes of causal SNPs are randomly assigned by β i = N(0,1) and then adjusted based on the given heritability. The first column corresponds to coverage, which is the probability of a credible set to include at least one causal SNP (and it is equivalent to 1-False Coverage Rate), while the second column gives the corresponding power, i.e. the total proportion of causal SNPs detected by all reported credible sets. The third column gives the average size (number of variants) in credible sets. η is chosen as 0.5 and q thr = 0.95, both as suggested in SuSiE.

https://doi.org/10.1371/journal.pgen.1011372.g003

Application to UKB height data

We used UK Biobank (UKB) genome data (N total = 337 145 after QC) and standing height as phenotype to evaluate the performance of the Finemap-MiXeR method using real data. UK Biobank data was obtained under accession number 27412. Our UKB data included 12,926,669 SNPs and 337,145 subjects, derived from the UKB imputed v3 dataset. During sample QC, we selected unrelated individuals with white British ancestry, removed sex chromosome aneuploidy, and excluded participants who withdrew their consent. SNP-based QC was applied as follows: “plink—maf 0.001—hwe 1e-10—geno 0.1”, in addition to filtering SNPs with imputation INFO score below 0.8 and excluding SNPs with duplicated RS IDs. Since the ground truth causal variants for height are not known, we compared the three methods by predicting height using the SNPs finemapped by each of the algorithms, and then evaluating the correlation between the predicted height and the actual height.

Since the main purpose of finemapping is not phenotype prediction, corresponding prediction performance may not be considered as an ultimate metric to compare the accuracy of finemapping methods. On the other hand, for highly polygenic and heritable phenotypes such as height, ground truth causals may not be well known and thus it can still be interesting and useful to compare methods with respect to the predicted performance of finemapped SNPs.

To achieve this, we split the individual-level UKB data into 80% for training, and 20% for testing. Training set was used to perform finemapping and estimate corresponding weight of finemapped SNP in a linear predictor estimating the height; testing set was used to estimate the height and evaluate the correlation with measured height.

We conducted this procedure for multiple loci associated with height. In particular, we chose the loci that are strongly associated with height based on their corresponding p-value of lead SNPs using recent height GWAS [ 28 ]. We examined 31 loci whose lead SNPs’ p-value was lower than 10 −60 and locus size was lower than 10,000. Note that those loci vary in h2 and M (for details see Table A in S1 Text ). In order to get input data for the methods, we applied GWAS to those loci using training set and obtained corresponding z-scores. Then using these z-scores, we ran the 3 algorithms (Finemap-MiXeR, SuSiE RSS and FINEMAP) and obtained the posterior causal probability for each SNP.

Afterwards, for each method, we used the SNP with highest posterior causal probability to estimate height using Multiple Linear Regression (MLR). Basically, we estimated the effect size coefficient of this finemapped SNP, using train data and then applied this coefficient to test data to evaluate the performance.

research paper on bayesian statistics

The details of loci are given in Table A in S1 Text . R2 values corresponds to the correlation between estimated phenotype of test data using the three methods and actual test data. Tools are applied on training data and SNP with highest posterior causal probability were obtained. Then this SNPs is used to estimate test phenotype data. There were 9 loci (29%) [ 2 , 7 – 9 , 12 , 13 , 17 , 23 , 29 ] where Finemap-MiXeR obtained substantially higher R2 than both of the other methods. For 16 loci (51%) [ 1 , 3 – 5 , 7 , 10 , 11 , 14 – 16 , 19 – 22 , 27 , 30 ], Finemap-MiXeR obtained best (or quite close to the best) R2 results as one or both of the other methods. There were only 6 loci [ 6 , 18 , 24 – 26 , 28 ] where one of the other methods were better than Finemap-MiXeR, with two loci (6%) [ 25 , 28 ] for SuSiE and one locus (3%) [ 24 ] for FINEMAP and three locus for both [ 6 , 18 , 26 ].

https://doi.org/10.1371/journal.pgen.1011372.g004

Application to Alzheimer’s disease in 19p13.3/ABCA7

The apolipoprotein E (APOE) gene on chromosome 19q13.32, was the first, and by far the strongest, genetic risk factor for ALZ. Additional signals associated with ALZ have been located in chromosome 19, such as ABCA7 gene in 19p13.3 [ 29 ]. Here, we examined this locus to check if our Finemap-MiXeR method is able to detect ALZ associated rs4147929 variant in this locus. For this aim, we are using summary statistics presented in [ 30 ]. We have used the corresponding z-scores in locus 19p13.3. Specifically, we extracted z-scores of this locus in 1 megabase region centered by rs4147929 variant. We also need to have A matrix (which is the weighted version LD matrix as defined before). For A matrix we are using UKB data presented in “Application UKB Data” section. Using this A matrix and z-scores we have run Finemap-MiXeR and obtained the posterior causal probabilities of the locus as presented in Fig 5A . As shown in this figure, our method was able to detect causal variant rs4147929 successfully.

thumbnail

(A) Posterior causal probabilities of the variants in 19q13.32 around rs4147929 variant for ALZ. We used z-scores of this locus in 1 megabase region centered by rs4147929 variant using summary statistics given in [ 30 ]. For A matrix we are using UKB data presented in “Application UKB Data” section. (B) Posterior causal probabilities of the variants in 19q13.32 around rs356220 variant for PD. We used z-scores of this locus in 1 megabase region centered by rs356220 variant using summary statistics given in [ 33 ]. For A matrix we are using UKB data presented in “Application UKB Data” section.

https://doi.org/10.1371/journal.pgen.1011372.g005

Application to Parkinson’s Disease in 4q22, detection of rs356220 and rs11931074

Previous association studies showed that there is a strong association with Parkinson’s disease (PD) in the 4q22 region [ 31 ]. Strongest association in this locus has been detected as rs356220 in many studies [ 32 ]. This locus has also been used as an application in FINEMAP paper and it was aimed to finemap rs356220 with an additional SNP (rs7687945) that had been detected significant after a conditional analysis done by authors. Here we are aiming to finemap same locus using summary statistics obtained from [ 33 ]. We have examined a 1 megabase region centered by rs356220 and used the same procedure to obtain A matrix.

As can be seen from Fig 5B , our method was able to detect variant rs356220 as FINEMAP did. On the other hand, our method did not detect rs7687945 as FINEMAP did but detected another variant rs11931074. Note that the association of variant rs11931074 has also been identified recently in some studies [ 34 ]. Therefore, our method detects two variants (with highest posterior causal probabilities) that have already been validated in independent studies.

Runtime performance and computational complexity

We examined the computational complexity of our methods with FINEMAP, SuSiE and SuSiE-RSS using runtime performance of the methods. As presented before, our Finemap-MiXeR method requires O(M 2 ) computations per iteration. We also showed that we can reduce the complexity from O(M 2 ) to O(p c M) by preserving accuracy, where p c << M. In SuSiE, the number of computations per iteration is O(kMN), and in its extension SuSiE-RSS, it is O(kM 2 ). In FINEMAP, the worst-case computation required per iteration is O(k 2 M). However, the algorithm was optimized to perform the search only among the SNPs with non-negligible posterior probabilities of being causal, using a hash table in order not to recalculate the same configurations. Thus, the complexity is expected to be reduced when the signal (heritability) is low.

We examined the runtime performance of Finemap-MiXeR, SuSiE and FINEMAP using the same data with different parameters. It is important to note that runtimes may largely differ due to different implementation (FINEMAP 1.4 software used C++ code and is distributed as pre-compiled executable, SuSiE is an R package, Finemap-MiXeR is implemented using MATLAB). On the other hand, we can still compare how the runtime scales with respect to k, M, and h 2 parameters. It is worth noting that the the computational performance of the methods Finemap-MiXeR, FINEMAP and SuSiE RSS are independent of N, since they use summary statistics, while SuSiE requires individual-level data, hence its computational complexity depends on N. For comparison, previously defined synthetic data created by hapgen2 (N = 10 000) are used. All tools are run in HPC with Intel Xeon CPU E5-2698 v4 @2.20GHZ.

As can be seen in Fig 6 , for Finemap-MiXeR, the required running time increased as the square of M. Similarly, for SuSiE-RSS, it increased as the square of M, but it also scaled linearly with k. In SuSiE, the runtime was proportional to M and N and higher compared to SuSiE-RSS when N<M, but when M increased, SuSiE was faster than SuSiE-RSS as expected. On the other hand, the FINEMAP runtime increased directly proportional with M, but was more sensitive to the increase in h 2 (which is an expected behavior as explained above). Furthermore, in SuSiE, SuSiE-RSS and FINEMAP, the runtime increased as the number of causal variants increased, while in Finemap-MiXeR, the number of causal variants did not affect runtime performance. Finally, our extended version of Finemap-MiXeR, Finemap-MiXeR-PCA, reduced the rate of increase of runtime as M increases. This is expected, since computation per iteration is proportional with p c M, where p c is typically on the order of 100 and this is generally much lower than the size of a locus, M. Although this method consumes some time to determine eigenvalues and eigenvectors before starting the iteration, it is still much faster than the Finemap-MiXeR and it reduced the rate of increase with M.

thumbnail

Note that these figures are obtained using the synthetic data described in “Simulation with Synthetic data” section. First row of figures: varying size of number of causals from 1 to 10 while keeping constant M and h2. Second row of figures: varying size of the locus M from 1000 to 16000 while keeping constant true heritability (h2 = 0.04) and true number of causals (k = 10). Third row of figures: varying size of the true h2 while keeping M = 16000 and k = 10 as constant. ( A ) Computational performance on varying size of number of causals(k) from 1 to 10 while keeping M and h2 as constant. ( B ) Computational performance on varying size of the locus (M) from 1000 to 16000 while keeping h2 and k as constant. (C) Computational performance on varying size of the true h2 from 0.01 to 0.08 while keeping M and k as constant.

https://doi.org/10.1371/journal.pgen.1011372.g006

Variational Bayesian approach is becoming increasingly popular in statistical genetics due to its flexibility, improved accuracy and computational efficiency compared to other Bayesian methods. In the present study, we used this approach for finemapping, and developed the novel Finemap-MiXeR method.

The Finemap-MiXeR method performs better in terms of accuracy compared to other methods when we conduct comprehensive experiments on synthetic genetic data with different parameters (heritability, number of causal SNPs, loci length). The performance improvements were also observed in applications with real genetic data. To this end, we applied the methods on height, using samples from the UKB. We evaluated multiple loci associated with height, varying in their heritability, loci length and observed that our method outperformed the other methods in most scenarios, yielding better accuracy in predicting the phenotype. Furhermore, we have validated our method on ALZ and PD applications.

One of the main reasons of these improvements in accuracy is MiXeR model’s flexibility in obtaining Bayesian inference for finemapping which leads to more accurate detection of causal variants. While the improvement in terms of accuracy compared to existing methods can be regarded as marginal, we believe that future extensions of the method will yield further improvements in the method’s accuracy. In this paper, we assumed that all SNPs have equal priors and hyperparameters are constant across all SNPs. On the other hand, this assumption would be relaxed, and it is also possible to apply enriched priors to improve method’s accuracy.

Another benefit of Finemap-MiXeR is its computational effectiveness. Thanks to the MiXeR model with our tractable optimization function, our method’s complexity only depends on the size of the locus (M) and does not increase as the number of causals and/or locus’s heritability increase, unlike the other methods do. In particular, although our method’s complexity is increased by O(M 2 ) and thus is comparable with SuSiE (O(kMN)) and SuSiE-RSS (O(kM 2 )), our method’s complexity is independent of the number of causals. Furthermore, unlike FINEMAP method, our method´s computational complexity is independent from the heritability. Finally, using Finemap-MiXeR-PCA, it is possible to reduce the complexity of our method to O(p c M) hence to make it linearly scalable with M. Furthermore, unlike many other methods, our method does not require to compute the inverse of the LD matrix, which can be problematic due to dimensionality and rank deficiency.

Variational Bayesian approach has been used to improve the accuracy of the polygenic risk scores (PRS), optimizing Evidence Lower Bound (ELBO) using variational Expected Maximization (EM) algorithm [ 23 ]. Here, we optimized ELBO using ADAM algorithm instead of the variational EM [ 23 ], leading to better accuracy and better computational complexity compared to the existing finemapping methods. Applying variational Bayesian inference in the context of the MiXeR model to estimate posterior effect size distribution of individual SNPs provides broad opportunities for novel applications of this model in statistical genetics. Beyond finemapping, it can be used together with gene set enrichment analysis thus improving functional interpretation of the GWAS findings. Furthermore, our model can be also extended to cross-ancestry and cross-trait finemapping. Particularly, thanks to the flexibility of our optimization procedure, we can use the same framework for further improvements in Finemap-MiXeR tool, increasing its accuracy by leveraging differential enrichment in functional annotations [ 35 ], and extending it to other applications, e.g. finemapping causal variants underlying multiple traits [ 36 ], or performing cross-ancestry analysis for a single trait. We may utilize our mathematical framework with the existing bivariate-MiXeR model to optimize the corresponding ELBO and to perform finemapping in cross-traits [ 18 ], or we may incorporate enriched priors by combining our method with another extension of MiXeR model for the gene-set enrichment called GSA-MiXeR [ 37 ]. Furthermore, trying different parametric families for derivation of ELBO might potentially improve the performance further.

Despite of these advantages and promising results, our method has certain limitations. Although our method is computationally efficient and is shown to scale better than other methods with respect to various parameters, the wall runtime is generally slower than SuSiE RSS, due to the difference in implementation and software optimization of the tools. Another point is that our method constructs credible sets after obtaining the posterior causal probabilities. In future studies, we may also use the credible sets concept during the inference such as incorporation of priors with respect to possible credible sets. This, as a future work, would be able to improve the performance and address some existing challenges. For instance, in the current approach, two true causal SNPs may be assigned to the same credible set if they are in high LD. These limitations, however, do not preclude real-world application of our method and its software implementation.

In conclusion, Finemap-MiXeR is a novel and accurate method for finemapping analysis of GWAS data from complex human traits and has strong potential for further extensions.

Supporting information

S1 notes. includes all the technical details regarding the derivation of the proposed method..

https://doi.org/10.1371/journal.pgen.1011372.s001

S1 Text. Includes more simulation results.

https://doi.org/10.1371/journal.pgen.1011372.s002

Acknowledgments

This work also used the TSD (Tjeneste for Sensitive Data) facilities, owned by the University of Oslo, operated and developed by the TSD service group at the University of Oslo, IT-Department (USIT, [email protected] ), using resources provided by UNINETT Sigma2—the National Infrastructure for High Performance Computing and Data Storage in Norway.

  • View Article
  • PubMed/NCBI
  • Google Scholar

IMAGES

  1. Bayesian Statistics

    research paper on bayesian statistics

  2. Bayesian Ideas and Data Analysis (eBook Rental)

    research paper on bayesian statistics

  3. (PDF) Bayesian Analysis of Experimental Data

    research paper on bayesian statistics

  4. (PDF) Bayesian Data Analysis using R

    research paper on bayesian statistics

  5. Bayesian Estimation

    research paper on bayesian statistics

  6. PPT

    research paper on bayesian statistics

COMMENTS

  1. Bayesian statistics and modelling

    Of these, 100 articles (13.5%) were tutorials for implementing Bayesian methods, and an additional 225 articles (30.4%) were either technical papers or commentaries on Bayesian statistics (Box 4 ...

  2. A practical guide to adopting Bayesian analyses in clinical research

    This paper's goal is to accelerate the transition of new methods into practice by improving the C&T researcher's understanding and confidence in Bayesian analyses implementation. Our examples focus on regression analysis reflecting standard analyses (linear and logistic regression) used in C&T research.

  3. The JASP guidelines for conducting and reporting a Bayesian analysis

    Despite the increasing popularity of Bayesian inference in empirical research, few practical guidelines provide detailed recommendations for how to apply Bayesian procedures and interpret the results. Here we offer specific guidelines for four different stages of Bayesian statistical reasoning in a research setting: planning the analysis, executing the analysis, interpreting the results, and ...

  4. Bayesian statistics

    If we observe a head, Bayes' theorem gives the posterior probabilities as P ( π0 | H) = π0 / ( π0 + πb) = 0.4 and P ( πb | H) = πb / ( π0 + πb) = 0.6. Here all the probabilities are known ...

  5. PDF Bayesian statistics and modelling

    Bayesian statistics is an approach to data analysis and parameter estimation based on Bayes' theorem. All observed and unobserved parameters in a statistical model are given a joint probability ...

  6. A systematic review of Bayesian articles in psychology: The last 25 years

    Although the statistical tools most often used by researchers in the field of psychology over the last 25 years are based on frequentist statistics, it is often claimed that the alternative Bayesian approach to statistics is gaining in popularity. In the current article, we investigated this claim by performing the very first systematic review of Bayesian psychological articles published ...

  7. An Introduction to Bayesian Data Analysis for Correlations

    Introduction. Bayesian approaches to data analysis can be a good alternative or supplement to traditional hypothesis testing. Unlike P values, simple Bayesian analyses can provide a direct measure of the strength of evidence both for and against a study hypothesis, which can be helpful for researchers for interpreting and making decisions about their results.

  8. Research Progress and Application of Bayesian Statistics

    This Special Issue aims to provide a collection of papers highlighting recent advances in theories and applications using Bayesian statistics. Bayesian methods have been widely used across different disciplines, and as such, this Special Issue welcomes contributions from different fields, such as medicine, epidemiology, engineering, economics ...

  9. (PDF) Bayesian Inference

    This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model.

  10. (PDF) Bayesian Statistics

    Bayesian probability is the foundation of Bayesian statistics. It interprets probability as an abstract concept — a quantity that one assign theoretically by specifying some prior probabilities ...

  11. A Gentle Introduction to Bayesian Analysis: Applications to

    Bayesian methods are also slowly becoming used in developmental research. For example, a number of Bayesian articles have been published in Child Development (n = 5), Developmental Psychology (n = 7), and Development and Psychopathology (n = 5) in the last 5 years (e.g., Meeus, Van de Schoot, Keijsers, Schwartz, & Branje, 2010; Rowe, Raudenbush, & Goldin-Meadow, 2012).

  12. (PDF) Bayes Theorem and Real-life Applications

    Bayes' theorem is an important part of inference statistics and many advanced machine learning models. Bayesian inference is a logical approach to updating the potential of hypotheses in the light ...

  13. Bayesian statistics in medical research: an intuitive alternative to

    Abstract. Statistical analysis of both experimental and observational data is central to medical research. Unfortunately, the process of conventional statistical analysis is poorly understood by many medical scientists. This is due, in part, to the counter-intuitive nature of the basic tools of traditional (frequency-based) statistical inference.

  14. Bayesian Probability and Statistics in Management Research:

    Special Issue PurposeThis special issue is focused on how a Bayesian approach to estimation, inference, and reasoning in organizational research might supplement—and in some cases supplant—traditio...

  15. PDF Andrew Gelman and Cosma Rohilla Shalizi

    2Statistics Department, Carnegie Mellon University, Santa Fe Institute, Pittsburgh, USA A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful

  16. Bayesian Hierarchical Models

    All statistical models are predicated on assumptions that should be understood before applying the method. Bayesian hierarchical models rely on various assumptions (eg, the number of levels and the prior probability distributions used as the basis for Bayesian estimation of treatment effects) to estimate and separate within- and across-group variability. 6 Additionally, most BHMs assume a ...

  17. A Bayesian Statistics Course for Undergraduates: Bayesian Thinking

    Students in these independent studies are engaged in almost the entire process of applied statistics research: review literature, collect/find datasets, implement methods, analyze the results, and write a journal-style article/report. Some projects can be turned into a new topic in the future iterations of this Bayesian statistics course.

  18. Bayesian statistics

    Bayesian statistics (/ ˈ b eɪ z i ə n / BAY-zee-ən or / ˈ b eɪ ʒ ən / BAY-zhən) [1] is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event.The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event.

  19. PDF Philosophy and the practice of Bayesian statistics

    A substantial school in the philosophy of science identi es Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but

  20. A Bayesian statistics approach to hospitality research

    Bayesian statistics can lead the research into areas where data are already representing a subjective choice. This paper applied Bayesian statistics for the analysis of customers' online reviews for a hotel to shed light on the potential of the Bayesian approach. ... Citation 1995), which are beyond the scope of this paper.

  21. [2408.06504v1] Generative Bayesian Modeling with Implicit Priors

    Generative models are a cornerstone of Bayesian data analysis, enabling predictive simulations and model validation. However, in practice, manually specified priors often lead to unreasonable simulation outcomes, a common obstacle for full Bayesian simulations. As a remedy, we propose to add small portions of real or simulated data, which creates implicit priors that improve the stability of ...

  22. PDF Bayesian statistics and modelling

    Abstract | Bayesian statistics is an approach to data analysis based on Bayes' theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution, and combined with observational data in the form of a likelihood function to ...

  23. Papers with Code

    For the more challenging generalized linear models (GLMs), however, Bayesian model selection consistency results are lacking in several ways. In this paper, we construct a Bayesian posterior distribution using an appropriate data-dependent prior and develop its asymptotic concentration properties using new theoretical techniques.

  24. Bayesian statistics Research Papers

    by Peyakunta Bhargavi. 8. Geography , Data Mining , Bayesian statistics , Classification. Bayesian-Integrated Microbial Forensics. In the aftermath of the 2001 anthrax letters, researchers have been exploring ways to predict the production environment of unknown-source microorganisms. Culture medium, presence of agar, culturing temperature, and ...

  25. The application of Bayesian inference under SAFE model: Statistics: Vol

    Abstract. This paper responds to Professor Paolo Giudici's call for papers in 'Safe Machine Learning' and explores the application of Bayesian inference within the SAFE model framework, aiming to enhance the accuracy and reliability of environmental, social, and governance (ESG) factor analysis in the financial sector.

  26. A Gentle Introduction to Bayesian Analysis: Applications to

    Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are ...

  27. Finemap-MiXeR: A variational Bayesian approach for genetic finemapping

    Author summary Genome-Wide Association Studies report the effect size of each genomic variant as summary statistics. Due to the correlated structure of the genomic variants, it may not be straightforward to determine the actual causal genomic variants from these summary statistics. Finemapping studies aim to identify these causal SNPs using different approaches. Here, we presented a novel ...

  28. PDF WORKING PAPER · NO. 2024 96 Disemployment Effects of Unemployment

    cases, multiple papers study the same benefit variation in the same region. We refer to these as "contexts", and allow for serial correlation between estimates within the same context. This analysis has 89 observations. 2.One Estimate Per Policy Margin-Outcome-Paper For the Bayesian model averaging in

  29. Scholarly Article or Book Chapter

    A Bayesian Approach to Multistate Hidden Markov Models: Application to Dementia Progression. ... Affiliation: College of Arts and Sciences, Department of Statistics and Operations Research; Abstract. ... Deposit scholarly works such as posters, presentations, research protocols, conference papers or white papers. If you would like to deposit a ...

  30. Resistance training prescription for muscle strength and hypertrophy in

    Objective To determine how distinct combinations of resistance training prescription (RTx) variables (load, sets and frequency) affect muscle strength and hypertrophy. Data sources MEDLINE, Embase, Emcare, SPORTDiscus, CINAHL, and Web of Science were searched until February 2022. Eligibility criteria Randomised trials that included healthy adults, compared at least 2 predefined conditions (non ...