The criteria and considerations that should be made by reviewers when answering the questions in the revised JBI critical appraisal tool for RCTs are shown in Table 2 . This tool is also available to download as Supplemental Digital Content 1 at https://links.lww.com/SRX/A7 .
RoB assessor: | Date of appraisal: | Record number: | ||||
---|---|---|---|---|---|---|
Study author: | Study title: | Study year: | ||||
Internal validity | Choice - comments/justification | Yes | No | Unclear | N/A | |
1 | Was true randomization used for assignment of participants to treatment groups? | □ | □ | □ | □ | |
2 | Was allocation to treatment groups concealed? | □ | □ | □ | □ | |
3 | Were treatment groups similar at the baseline? | □ | □ | □ | □ | |
4 | Were participants blind to treatment assignment? | □ | □ | □ | □ | |
5 | Were those delivering the treatment blind to treatment assignment? | □ | □ | □ | □ | |
6 | Were treatment groups treated identically other than the intervention of interest? | □ | □ | □ | □ | |
7 | Were outcome assessors blind to treatment assignment? | Yes | No | Unclear | N/A | |
Outcome 1 | □ | □ | □ | □ | ||
Outcome 2 | □ | □ | □ | □ | ||
Outcome 3 | □ | □ | □ | □ | ||
Outcome 4 | □ | □ | □ | □ | ||
Outcome 5 | □ | □ | □ | □ | ||
Outcome 6 | □ | □ | □ | □ | ||
Outcome 7 | □ | □ | □ | □ | ||
8 | Were outcomes measured in the same way for treatment groups? | Yes | No | Unclear | N/A | |
Outcome 1 | □ | □ | □ | □ | ||
Outcome 2 | □ | □ | □ | □ | ||
Outcome 3 | □ | □ | □ | □ | ||
Outcome 4 | □ | □ | □ | □ | ||
Outcome 5 | □ | □ | □ | □ | ||
Outcome 6 | □ | □ | □ | □ | ||
Outcome 7 | □ | □ | □ | □ | ||
9 | Were outcomes measured in a reliable way? | Yes | No | Unclear | N/A | |
Outcome 1 | □ | □ | □ | □ | ||
Outcome 2 | □ | □ | □ | □ | ||
Outcome 3 | □ | □ | □ | □ | ||
Outcome 4 | □ | □ | □ | □ | ||
Outcome 5 | □ | □ | □ | □ | ||
Outcome 6 | □ | □ | □ | □ | ||
Outcome 7 | □ | □ | □ | □ | ||
10 | Was follow-up complete and, if not, were differences between groups in terms of their follow-up adequately described and analyzed? | |||||
Outcome 1 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 2 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 3 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 4 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 5 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 6 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 7 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ |
Statistical conclusion validity | Choice - comments/justification | Yes | No | Unclear | N/A | |
---|---|---|---|---|---|---|
11 | Were participants analyzed in the groups to which they were randomized? | |||||
Outcome 1 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 2 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 3 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 4 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 5 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 6 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 7 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
12 | Was appropriate statistical analysis used? | |||||
Outcome 1 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 2 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 3 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 4 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 5 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 6 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
Outcome 7 | Yes | No | Unclear | N/A | ||
Result 1 | □ | □ | □ | □ | ||
Result 2 | □ | □ | □ | □ | ||
Result 3 | □ | □ | □ | □ | ||
13 | Was the trial design appropriate and any deviations from the standard RCT design (individual randomization, parallel groups) accounted for in the conduct and analysis of the trial? | □ | □ | □ | □ | |
□ | □ | □ | ||||
Question 1: Was true randomization used for assignment of participants to treatment groups?
Category: Internal validity
Domain: Bias related to selection and allocation
Appraisal: Study level
If participants are not allocated to treatment and control groups by random assignment, there is a risk that this assignment to groups can be influenced by the known characteristics of the participants themselves. These known characteristics of the participants may distort the comparability of the groups (ie, does the intervention group contain more people over the age of 65 compared to the control?). A true random assignment of participants to the groups means that a procedure is used that allocates the participants to groups purely based on chance, not influenced by any known characteristics of the participants. Reviewers should check the details about the randomization procedure used for allocation of the participants to study groups. Was a true chance (random) procedure used? For example, was a list of random numbers used? Was a computer-generated list of random numbers used? Was a statistician, external to the research team, consulted for the randomization sequence generation? Additionally, reviewers should check that the authors are not stating they have used random approaches when they have instead used systematic approaches (such as allocating by days of the week).
Question 2: Was allocation to groups concealed?
If those allocating participants to the compared groups are aware of which group is next in the allocation process (ie, the treatment or control group), there is a risk that they may deliberately and purposefully intervene in the allocation of patients. This may result in the preferential allocation of patients to the treatment group or to the control group. This may directly distort the results of the study, as participants no longer have an equal and random chance to belong to each group. Concealment of allocation refers to procedures that prevent those allocating patients from knowing before allocation which treatment or control is next in the allocation process. Reviewers should check the details about the procedure used for allocation concealment. Was an appropriate allocation concealment procedure used? For example, was central randomization used? Were sequentially numbered, opaque, and sealed envelopes used? Were coded drug packs used?
Question 3: Were treatment groups similar at the baseline?
As with question 1, any difference between the known characteristics of participants included in the compared groups constitutes a threat to internal validity. If differences in these characteristics do exist, then there is potential that the effect cannot be attributed to the potential cause (the examined intervention or treatment). This is because the effect may be explained by the differences between participant characteristics and not the intervention/treatment of interest. Reviewers should check the characteristics reported for participants. Are the participants from the compared groups similar with regard to the characteristics that may explain the effect, even in the absence of the cause (eg, age, severity of the disease, stage of the disease, coexisting conditions)? Reviewers should check the proportion of participants with specific relevant characteristics in the compared groups. (Note: Do not only consider the P value for the statistical testing of the differences between groups with regard to the baseline characteristics.)
Question 4: Were participants blind to treatment assignment?
Domain: Bias related to administration of intervention/exposure
Participants who are aware of their allocation to either the treatment or the control may behave, respond, or react differently to their assigned treatment (or control) compared with participants who remain unaware of their allocation. Blinding of participants is a technique used to minimize this risk. Blinding refers to procedures that prevent participants from knowing which group they are allocated. If blinding has been followed, participants are not aware if they are in the group receiving the treatment of interest or if they are in another group receiving the control intervention. Reviewers should check the details reported in the article about the blinding of participants with regard to treatment assignment. Was an appropriate blinding procedure used? For example, were identical capsules or syringes used? Were identical devices used? Be aware of different terms used; blinding is sometimes also called masking.
Question 5: Were those delivering the treatment blind to treatment assignment?
Like question 4, those delivering the treatment who are aware of participant allocation to either treatment or control may treat participants differently compared to those who remain unaware of participant allocation. There is a risk that any potential change in behavior may influence the implementation of the compared treatments, and the results of the study may be distorted. Blinding of those delivering treatment is used to minimize this risk. When this level of blinding has been achieved, those delivering the treatment are not aware if they are treating the group receiving the treatment of interest or if they are treating any other group receiving the control intervention. Reviewers should check the details reported in the article about the blinding of those delivering treatment with regard to treatment assignment. Is there any information in the article about those delivering the treatment? Were those delivering the treatment unaware of the assignments of participants to the compared groups?
Question 6: Were treatment groups treated identically other than the intervention of interest?
To attribute the effect to the cause (assuming there is no bias related to selection and allocation), there should be no difference between the groups in terms of treatment or care received, other than the treatment or intervention controlled by the researchers. If there are other exposures or treatments occurring at the same time as the cause (the treatment or intervention of interest), then the effect can potentially be attributed to something other than the examined cause (the investigated treatment). This is because it is plausible that the effect may be explained by other exposures or treatments that occurred at the same time as the cause. Reviewers should check the reported exposures or interventions received by the compared groups. Are there other exposures or treatments occurring at the same time as the cause? Is it plausible that the effect may be explained by other exposures or treatments occurring at the same time as the cause? Is it clear that there is no other difference between the groups in terms of treatment or care received, other than the treatment or intervention of interest?
Question 7: Were outcome assessors blind to treatment assignment?
Domain: Bias related to assessment, detection, and measurement of the outcome
Appraisal: Outcome level
Like questions 4 and 5, if those assessing the outcomes are aware of participant allocation to either treatment or control, they may treat participants differently compared with those who remain unaware of participant allocation. Therefore, there is a risk that the measurement of the outcomes between groups may be distorted, and the results of the study may themselves be distorted. Blinding of outcomes assessors is used to minimize this risk. Reviewers should check the details reported in the article about the blinding of outcomes assessors with regard to treatment assignment. Is there any information in the article about outcomes assessors? Were those assessing the treatment’s effects on outcomes unaware of the assignments of participants to the compared groups?
Question 8: Were outcomes measured in the same way for treatment groups?
If the outcome is not measured in the same way in the compared groups, there is a threat to the internal validity of a study. Any differences in outcome measurements may be due to the method of measurement employed between the 2 groups and not the intervention/treatment of interest. Reviewers should check whether the outcomes were measured in the same way. Was the same instrument or scale used? Was the measurement timing the same? Were the measurement procedures and instructions the same?
Question 9: Were outcomes measured in a reliable way?
Unreliability of outcome measurements is one threat that weakens the validity of inferences about the statistical relationship between the cause and the effect estimated in a study exploring causal effects. Unreliability of outcome measurements is one of the plausible explanations for errors of statistical inference with regard to the existence and the magnitude of the effect determined by the treatment (cause). Reviewers should check the details about the reliability of the measurement used, such as the number of raters, the training of raters, and the reliability of the intra-rater and the inter-raters within the study (not as reported in external sources). This question is about the reliability of the measurement performed in the study, and not about the validity of the measurement instruments/scales used in the study. Finally, some outcomes may not rely on instruments or scales (eg, death), and reliability of the measurements may need to be assessed in the context of the study being reviewed. (Note: Two other important threats that weaken the validity of inferences about the statistical relationship between the cause and the effect are low statistical power and the violation of the assumptions of statistical tests. These threats are explored within question 12.)
Question 10: Was follow-up complete and, if not, were differences between groups in terms of their follow-up adequately described and analyzed?
Domain: Bias related to participant retention
Appraisal: Result level
For this question, follow-up refers to the period from the moment of randomization to any point in which the groups are compared during the trial. This question asks whether there is complete knowledge (eg, measurements, observations) for the entire duration of the trial for all randomly allocated participants. If there is incomplete follow-up from all randomly allocated participants, this is known as post-assignment attrition. Because RCTs are not perfect, there is almost always post-assignment attrition, and the focus of this question is on the appropriate exploration of post-assignment attrition. If differences exist with regard to the post-assignment attrition between the compared groups of an RCT, then there is a threat to the internal validity of that study. This is because these differences may provide a plausible alternative explanation for the observed effect even in the absence of the cause (the treatment or intervention of interest). It is important to note that with regard to post-assignment attrition, it is not enough to know the number of participants and the proportions of participants with incomplete data; the reasons for loss to follow-up are essential in the analysis of risk of bias.
Reviewers should check whether there were differences with regard to the loss to follow-up between the compared groups. If follow-up was incomplete (incomplete information on all participants), examine the reported details about the strategies used to address incomplete follow-up. This can include descriptions of loss to follow-up (eg, absolute numbers, proportions, reasons for loss to follow-up) and impact analyses (the analyses of the impact of loss to follow-up on results). Was there a description of the incomplete follow-up including the number of participants and the specific reasons for loss to follow-up? Even if follow-up was incomplete but balanced between groups, if the reasons for loss to follow-up are different (eg, side effects caused by the intervention of interest), these may impose a risk of bias if not appropriately explored in the analysis. If there are differences between groups with regard to the loss to follow-up (numbers/proportions and reasons), was there an analysis of patterns of loss to follow-up? If there are differences between the groups with regard to the loss to follow-up, was there an analysis of the impact of the loss to follow-up on the results? (Note: Question 10 is not about intention-to-treat [ITT] analysis; question 11 is about ITT analysis.)
Question 11: Were participants analyzed in the groups to which they were randomized?
Category: Statistical conclusion validity
This question is about the ITT analysis. There are different statistical analysis strategies available for the analysis of data from RCTs, such as ITT, per-protocol analysis, and as-treated analysis. In the ITT analysis, the participants are analyzed in the groups to which they were randomized. This means that regardless of whether participants received the intervention or control as assigned, were compliant with their planned assignment, or participated for the entire study duration, they are still included in the analysis. The ITT analysis compares the outcomes for participants from the initial groups created by the initial random allocation of participants to those groups. Reviewers should check whether an ITT analysis was reported and the details of the ITT. Were participants analyzed in the groups to which they were initially randomized, regardless of whether they participated in those groups and regardless of whether they received the planned interventions?
Note: The ITT analysis is a type of statistical analysis recommended in the Consolidated Standards of Reporting Trials (CONSORT) statement on best practices in trials reporting, and it is considered a marker of good methodological quality of the analysis of results of a randomized trial. The ITT is estimating the effect of offering the intervention (ie, the effect of instructing the participants to use or take the intervention); the ITT is not estimating the effect of receiving the intervention of interest.
Question 12: Was appropriate statistical analysis used?
Inappropriate statistical analysis may cause errors of statistical inference with regard to the existence and the magnitude of the effect determined by the treatment (cause). Low statistical power and the violation of the assumptions of statistical tests are 2 important threats that weaken the validity of inferences about the statistical relationship between the cause and the effect. Reviewers should check the following aspects: if the assumptions of the statistical tests were respected; if appropriate statistical power analysis was performed; if appropriate effect sizes were used; if appropriate statistical methods were used given the nature of the data and the objectives of statistical analysis (eg, association between variables, prediction, survival analysis).
Question 13: Was the trial design appropriate and any deviations from the standard RCT design (individual randomization, parallel groups) accounted for in the conduct and analysis of the trial?
The typical, parallel group RCT may not always be appropriate depending on the nature of the question. Therefore, some additional RCT designs may have been employed that come with their own additional considerations.
Crossover trials should only be conducted with people with a chronic, stable condition, where the intervention produces a short-term effect (eg, relief in symptoms). Crossover trials should ensure there is an appropriate period of washout between treatments. This may also be considered under question 6.
Cluster RCTs randomize individuals or groups (eg, communities, hospital wards), forming clusters. When we assess outcomes on an individual level in cluster trials, there are unit-of-analysis issues, because individuals within a cluster are correlated. This should be considered by the study authors when conducting analysis, and ideally authors will report the intra-cluster correlation coefficient. This may also be considered under question 12.
Stepped-wedge RCTs may be appropriate to establish when and how a beneficial intervention may be best implemented within a defined setting, or due to logistical, practical, or financial considerations in the rollout of a new treatment/intervention. Data analysis in these trials should be conducted appropriately, considering the effects of time. This may also be considered under question 12.
Randomized controlled studies are the ideal, and often the only, included study design for systematic reviews assessing the effectiveness of interventions. All included studies must undergo rigorous critical appraisal, which, in the case of quantitative study designs, is predominantly focused on assessment of risk of bias in the conduct of the study. The revised JBI critical appraisal tool for RCTs presents an adaptable and robust new method for assessing this risk of bias. The tool has been designed to complement recent advancements in the field while maintaining its easy-to-follow questions. The revised JBI critical appraisal tool for RCTs offers systematic reviewers an improved and up-to-date method to assess the risk of bias for RCTs included in their systematic review.
The JBI Scientific Committee members for their feedback and contributions regarding the concept of this work and both the draft and final manuscript.
Coauthor Catalin Tufanaru passed away July 29, 2021.
MK is supported by the INTER-EXCELLENCE grant number LTC20031—Towards an International Network for Evidence-based Research in Clinical Health Research in the Czech Republic.
ZM is supported by an NHMRC Investigator Grant, APP1195676.
critical appraisal tool; methodological quality; methodology; randomized controlled trial; risk of bias
Revising the jbi quantitative critical appraisal tools to improve their..., methodological quality of case series studies: an introduction to the jbi..., assessing the risk of bias of quantitative analytical studies: introducing the..., the revised jbi critical appraisal tool for the assessment of risk of bias for..., from critical appraisal to risk of bias assessment: clarifying the terminology....
Pro tips: qualitative research checklist, articles on qualitative research design & methodology, e-books for terminology and definitions.
Qualitative research , in contrast to quantitative research, analyzes words and text instead of numbers and figures. Qualitative research is exploratory and non-experimental. It seeks to explore meaning, experiences and phenomena among study participants. Qualitative data is generated from participants' stories, open-ended responses, and viewpoints collected from focus groups, interviews, observations or detailed records (Schmidt & Brown, 2019, pp. 221-224).
Schmidt N. A. & Brown J. M. (2019). Evidence-based practice for nurses: Appraisal and application of research (4th ed.). Jones & Bartlett Learning.
Each JBI Checklist provides tips and guidance on what to look for to answer each question. These tips begin on page 4.
Below are some additional Frequently Asked Questions about the Qualitative Research Checklist that have been asked students in previous semesters.
Frequently Asked Question | Response |
In a qualitative study, it is important that all elements of the study - the objectives, methods, theoretical/conceptual framework, qualitative data gathering - all fit together in agreement and that they make sense. Please see page 4 of the JBI Qualitative Checklist for explanatory notes for each question, which elaborate on this concept further. | |
Sometimes, authors of a qualitative study will provide details about their own cultural or theoretical background. Look for this information in the beginning of the study or in the methods section. |
For more help: Each JBI Checklist provides detailed guidance on what to look for to answer each question on the checklist. These explanatory notes begin on page four of each Checklist. Please review these carefully as you conduct critical appraisal using JBI tools.
Danford, C. A. (2023). Understanding the evidence: Qualitative research designs . Urologic Nursing , 43 (1), 41–45. https://doi.org/10.7257/2168-4626.2023.43.1.41
Doyle, L., McCabe, C., Keogh, B., Brady, A., & McCann, M. (2020). An overview of the qualitative descriptive design within nursing research . Journal of Research in Nursing , 25 (5), 443–455. https://doi.org/10.1177/1744987119880234
Luciani, M., Jack, S. M., Campbell, K., Orr, E., Durepos, P., Li, L., Strachan, P., & Di Mauro, S. (2019). An introduction to qualitative health research . Professioni infermieristiche , 72 (1), 60–68.
Created by health science librarians.
Legend (let evidence guide every new decision) assessment tools: cincinnati children's hospital, equator network: enhancing the quality and transparency of health research, other tools for assessing qualitative research.
Why is this information important?
On this page you'll find:
The resources on this page will guide you to some of the alternative measures/tools or means you can use to assess qualitative research.
Evidence Evaluation Tools and Resources
This website has a number of resources for evaluating health sciences research across a variety of designs/study types, including an Evidence Appraisal form for qualitative research (in table), as well as forms for mixed methods studies from a variety of clinical question domains. The site includes information on the following:
The EQUATOR Network is an ‘umbrella’ organisation that brings together researchers, medical journal editors, peer reviewers, developers of reporting guidelines, research funding bodies and other collaborators with mutual interest in improving the quality of research publications and of research itself.
The EQUATOR Library contains a comprehensive searchable database of reporting guidelines for many study types--including qualitative--and also links to other resources relevant to research reporting:
Also see Articles box, below, some of which contain checklists or tools.
Most checklists or tools are meant to help you think critically and systematically when appraising research. Users should generally consult accompanying materials such as manuals, handbooks, and cited literature to use these tools appropriately. Broad understanding of the variety and complexity of qualitative research is generally necessary, along with an understanding of the philosophical perspectives plus knowledge about specific qualitative research methods and their implementation.
These articles address a range of issues related to understanding and evaluating qualitative research; some include checklists or tools.
Clissett, P. (2008) "Evaluating Qualitative Research." Journal of Orthopaedic Nursing 12: 99-105.
Cohen, Deborah J. and Benjamin F. Crabtree. (2008) "Evidence for Qualitative Research in Health Care: Controversies and Recommendations." Annals of Family Medicine 6(4): 331-339.
Dixon-Woods, M., R.L. Shaw, S. Agarwal, and J.A. Smith. (2004) "The Problem of Appraising Qualitative Research." Qual Safe Health Care 13: 223-225.
Fossey, E., C. Harvey, F. McDermott, and L. Davidson. (2002) "Understanding and Evaluating Qualitative Research." Australian and New Zealand Journal of Psychiatry 36(6): 717-732.
Hammarberg, K., M. Kirkman, S. de Lacey. (2016) "Qualitative Research Methods: When to Use and How to Judge them." Human Reproduction 31 (3): 498-501.
Lee, J. (2014) "Genre-Appropriate Judgments of Qualitative Research." Philosophy of the Social Sciences 44(3): 316-348. (This provides 3 strategies for evaluating qualitative research, 2 that the author is not crazy about and one that he considers more appropriate/accurate).
Majid, Umair and Vanstone,Meredith (2018). "Appraising Qualitative Research for Evidence Syntheses: A Compendium of Quality Appraisal Tools." Qualitative Health Research 28(13): 2115-2131. PMID: 30047306 DOI: 10.1177/1049732318785358
Meyrick, Jane. (2006) "What is Good Qualitative Research? A First Step towards a Comprehensive Approach to Judging Rigour/Quality." Journal of Health Psychology 11(5): 799-808.
Miles, MB, AM Huberman, J Saldana. (2014) Qualitative Data Analysis. Thousand Oaks, Califorinia, SAGE Publications, Inc. Chapter 11: Drawing and Verifying Conclusions . Check Availability of Print Book .
Morse, JM. (1997) "Perfectly Healthy but Dead:"The Myth of Inter-Rater Reliability. Qualitative Health Research 7(4): 445-447.
O’Brien BC, Harris IB, Beckman TJ, et al. (2014) Standards for reporting qualitative research: a synthesis of recommendations . Acad Med 89(9):1245–1251. DOI: 10.1097/ACM.0000000000000388 PMID: 24979285
The Standards for Reporting Qualitative Research (SRQR) consists of 21 items. The authors define and explain key elements of each item and provide examples from recently published articles to illustrate ways in which the standards can be met. The SRQR aims to improve the transparency of all aspects of qualitative research by providing clear standards for reporting qualitative research. These standards will assist authors during manuscript preparation, editors and reviewers in evaluating a manuscript for potential publication, and readers when critically appraising, applying, and synthesizing study findings.
Ryan, Frances, Michael Coughlin, and Patricia Cronin. (2007) "Step by Step Guide to Critiquing Research: Part 2, Qualitative Research." British Journal of Nursing 16(12): 738-744.
Stige, B, K. Malterud, and T. Midtgarden. (2009) "Toward an Agenda for Evaluation of Qualitative Research." Qualitative Health Research 19(10): 1504-1516.
Tong, Allison and Mary Amanda Dew. (2016-EPub ahead of print). "Qualitative Research in Transplantation: Ensuring Relevance and Rigor. " Transplantation
Allison Tong, Peter Sainsbury, Jonathan Craig; Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups , International Journal for Quality in Health Care , Volume 19, Issue 6, 1 December 2007, Pages 349–357, https://doi.org/10.1093/intqhc/mzm042
The criteria included in COREQ, a 32-item checklist, can help researchers to report important aspects of the research team, study methods, context of the study, findings, analysis and interpretations. Items most frequently included in the checklists related to sampling method, setting for data collection, method of data collection, respondent validation of findings, method of recording data, description of the derivation of themes and inclusion of supporting quotations. We grouped all items into three domains: (i) research team and reflexivity, (ii) study design and (iii) data analysis and reporting.
Tracy, Sarah (2010) “Qualitative Quality: Eight ‘Big-Tent’ Criteria for Excellent Qualitative Research.” Qualitative Inquiry 16(10):837-51
Not a checklist, this is a thorough discussion of assessing the scientific merit of a study based on in-depth interviews or participant observation, first by assessing exposure (e.g. time exposed in the field). Then, assuming sufficient exposure, the authors propose looking for signs of
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Email citation, add to collections.
Your saved search, create a file for external citation management software, your rss feed.
Affiliations.
Introduction: Systematic reviews provide a rigorous synthesis of the best available evidence regarding a certain question. Where high-quality evidence is lacking, systematic reviewers may choose to rely on case series studies to provide information in relation to their question. However, to date there has been limited guidance on how to incorporate case series studies within systematic reviews assessing the effectiveness of an intervention, particularly with reference to assessing the methodological quality or risk of bias of these studies.
Methods: An international working group was formed to review the methodological literature regarding case series as a form of evidence for inclusion in systematic reviews. The group then developed a critical appraisal tool based on the epidemiological literature relating to bias within these studies. This was then piloted, reviewed, and approved by JBI's international Scientific Committee.
Results: The JBI critical appraisal tool for case series studies includes 10 questions addressing the internal validity and risk of bias of case series designs, particularly confounding, selection, and information bias, in addition to the importance of clear reporting.
Conclusion: In certain situations, case series designs may represent the best available evidence to inform clinical practice. The JBI critical appraisal tool for case series offers systematic reviewers an approved method to assess the methodological quality of these studies.
PubMed Disclaimer
Full text sources.
NCBI Literature Resources
MeSH PMC Bookshelf Disclaimer
The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
1 Center for Evidence-Based and Translational Medicine, Zhongnan Hospital, Wuhan University, 169 Donghu Road, Wuchang District, Wuhan, 430071 Hubei China
2 Department of Evidence-Based Medicine and Clinical Epidemiology, The Second Clinical College, Wuhan University, Wuhan, 430071 China
Xian-tao zeng.
3 Center for Evidence-Based and Translational Medicine, Wuhan University, Wuhan, 430071 China
4 Global Health Institute, Wuhan University, Wuhan, 430072 China
The data and materials used during the current review are all available in this review.
Methodological quality (risk of bias) assessment is an important step before study initiation usage. Therefore, accurately judging study type is the first priority, and the choosing proper tool is also important. In this review, we introduced methodological quality assessment tools for randomized controlled trial (including individual and cluster), animal study, non-randomized interventional studies (including follow-up study, controlled before-and-after study, before-after/ pre-post study, uncontrolled longitudinal study, interrupted time series study), cohort study, case-control study, cross-sectional study (including analytical and descriptive), observational case series and case reports, comparative effectiveness research, diagnostic study, health economic evaluation, prediction study (including predictor finding study, prediction model impact study, prognostic prediction model study), qualitative study, outcome measurement instruments (including patient - reported outcome measure development, content validity, structural validity, internal consistency, cross-cultural validity/ measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness), systematic review and meta-analysis, and clinical practice guideline. The readers of our review can distinguish the types of medical studies and choose appropriate tools. In one word, comprehensively mastering relevant knowledge and implementing more practices are basic requirements for correctly assessing the methodological quality.
In the twentieth century, pioneering works by distinguished professors Cochrane A [ 1 ], Guyatt GH [ 2 ], and Chalmers IG [ 3 ] have led us to the evidence-based medicine (EBM) era. In this era, how to search, critically appraise, and use the best evidence is important. Moreover, systematic review and meta-analysis is the most used tool for summarizing primary data scientifically [ 4 – 6 ] and also the basic for developing clinical practice guideline according to the Institute of Medicine (IOM) [ 7 ]. Hence, to perform a systematic review and/ or meta-analysis, assessing the methodological quality of based primary studies is important; naturally, it would be key to assess its own methodological quality before usage. Quality includes internal and external validity, while methodological quality usually refers to internal validity [ 8 , 9 ]. Internal validity is also recommended as “risk of bias (RoB)” by the Cochrane Collaboration [ 9 ].
There are three types of tools: scales, checklists, and items [ 10 , 11 ]. In 2015, Zeng et al. [ 11 ] investigated methodological quality tools for randomized controlled trial (RCT), non-randomized clinical intervention study, cohort study, case-control study, cross-sectional study, case series, diagnostic accuracy study which also called “diagnostic test accuracy (DTA)”, animal study, systematic review and meta-analysis, and clinical practice guideline (CPG). From then on, some changes might generate in pre-existing tools, and new tools might also emerge; moreover, the research method has also been developed in recent years. Hence, it is necessary to systematically investigate commonly-used tools for assessing methodological quality, especially those for economic evaluation, clinical prediction rule/model, and qualitative study. Therefore, this narrative review presented related methodological quality (including “RoB”) assessment tools for primary and secondary medical studies up to December 2019, and Table 1 presents their basic characterizes. We hope this review can help the producers, users, and researchers of evidence.
The basic characteristics of the included methodological quality (risk of bias) assessment tools
No. | Development Organization | Tool’s name | Type of study |
---|---|---|---|
1 | The Cochrane Collaboration | Cochrane RoB tool and RoB 2.0 tool | Randomized controlled trial Diagnostic accuracy study |
2 | The Physiotherapy Evidence Database (PEDro) | PEDro scale | Randomized controlled trial |
3 | The Effective Practice and Organisation of Care (EPOC) Group | EPOC RoB tool | Randomized controlled trial Clinical controlled trials Controlled before-and-after study Interrupted time series studies |
4 | The Critical Appraisal Skills Programme (CASP) | CASP checklist | Randomized controlled trial Cohort study Case-control study Cross-sectional study Diagnostic test study Clinical prediction rule Economic evaluation Qualitative study Systematic review |
5 | The National Institutes of Health (NIH) | NIH quality assessment tool | Controlled intervention study Cohort study Cross-sectional study Case-control study Before-after (Pre-post) study with no control group Case-series (Interventional) Systematic review and meta-analysis |
6 | The Joanna Briggs Institute (JBI) | JBI critical appraisal checklist | Randomized controlled trial Non-randomized experimental study Cohort study Case-control study Cross-sectional study Prevalence data Case reports Economic evaluation Qualitative study Text and expert opinion papers Systematic reviews and research syntheses |
7 | The Scottish Intercollegiate Guidelines Network (SIGN) | SIGN methodology checklist | Randomized controlled trial Cohort study Case-control study Diagnostic study Economic evaluation Systematic reviews and meta-analyses |
8 | The Stroke Therapy Academic Industry Roundtable (STAIR) Group | CAMARADES tool | Animal study |
9 | The SYstematic Review Center for Laboratory animal Experimentation (SYRCLE) | SYRCLE’s RoB tool | Animal study |
10 | Sterne JAC et al. | ROBINS-I tool | Non-randomised interventional study |
11 | Slim K et al. | MINORS tool | Non-randomised interventional study |
12 | The Canada Institute of Health Economics (IHE) | IHE quality appraisal tool | Case-series (Interventional) |
13 | Wells GA et al. | Newcastle-Ottawa Scale (NOS) | Cohort study Case-control study |
14 | Downes MJ et al. | AXIS tool | Cross-sectional study |
15 | The Agency for Healthcare Research and Quality (AHRQ) | AHRQ methodology checklist | Cross-sectional/ Prevalence study |
16 | Crombie I | Crombie’s items | Cross-sectional study |
17 | The Good Research for Comparative Effectiveness (GRACE) Initiative | GRACE checklist | Comparative effectiveness research |
18 | Whiting PF et al. | QUADAS tool and QUADAS-2 tool | Diagnostic accuracy study |
19 | The National Institute for Clinical Excellence (NICE) | NICE methodology checklist | Economic evaluation |
20 | The Cabinet Office | The Quality Framework: Cabinet Office checklist | Qualitative study (social research) |
21 | Hayden JA et al. | QIPS tool | Prediction study (predictor finding study) |
22 | Wolff RF et al. | PROBAST | Prediction study (prediction model study) |
23 | The (COnsensus-based Standards for the selection of health Measurement INstruments) initiative | COSMIN RoB checklist | Patient-reported outcome measure development Content validity Structural validity Internal consistency Cross-cultural validity/ measurement invariance Reliability Measurement error Criterion validity Hypotheses testing for construct validity Responsiveness |
24 | Shea BJ et al. | AMSTAR and AMSTAR 2 | Systematic review |
25 | The Decision Support Unit (DSU) | DSU network meta-analysis (NMA) methodology checklist | Network meta-analysis |
26 | Whiting P et al. | ROBIS tool | Systematic review |
27 | Brouwers MC et al. | AGREE instrument and AGREE II instrument | Clinical practice guideline |
AMSTAR A measurement tool to assess systematic reviews, AHRQ Agency for healthcare research and quality, AXIS Appraisal tool for cross-sectional studies, CASP Critical appraisal skills programme, CAMARADES The collaborative approach to meta-analysis and review of animal data from experimental studies, COSMIN Consensus-based standards for the selection of health measurement instruments, DSU Decision support unit, EPOC the effective practice and organisation of care group, GRACE The god research for comparative effectiveness initiative, IHE Canada institute of health economics, JBI Joanna Briggs Institute, MINORS Methodological index for non-randomized studies, NOS Newcastle-Ottawa scale, NMA network meta-analysis, NIH national institutes of health, NICE National institute for clinical excellence, PEDro physiotherapy evidence database, PROBAST The prediction model risk of bias assessment tool, QUADAS Quality assessment of diagnostic accuracy studies, QIPS Quality in prognosis studies, RoB Risk of bias, ROBINS-I Risk of bias in non-randomised studies - of interventions, ROBIS Risk of bias in systematic review, SYRCLE Systematic review center for laboratory animal experimentation, STAIR Stroke therapy academic industry roundtable, SIGN The Scottish intercollegiate guidelines network
Randomized controlled trial (individual or cluster).
The first RCT was designed by Hill BA (1897–1991) and became the “gold standard” for experimental study design [ 12 , 13 ] up to now. Nowadays, the Cochrane risk of bias tool for randomized trials (which was introduced in 2008 and edited on March 20, 2011) is the most commonly recommended tool for RCT [ 9 , 14 ], which is called “RoB”. On August 22, 2019 (which was introduced in 2016), the revised revision for this tool to assess RoB in randomized trials (RoB 2.0) was published [ 15 ]. The RoB 2.0 tool is suitable for individually-randomized, parallel-group, and cluster- randomized trials, which can be found in the dedicated website https://www.riskofbias.info/welcome/rob-2-0-tool . The RoB 2.0 tool consists of five bias domains and shows major changes when compared to the original Cochrane RoB tool (Table S 1 A-B presents major items of both versions).
The Physiotherapy Evidence Database (PEDro) scale is a specialized methodological assessment tool for RCT in physiotherapy [ 16 , 17 ] and can be found in http://www.pedro.org.au/english/downloads/pedro-scale/ , covering 11 items (Table S 1 C). The Effective Practice and Organisation of Care (EPOC) Group is a Cochrane Review Group who also developed a tool (called as “EPOC RoB Tool”) for complex interventions randomized trials. This tool has 9 items (Table S 1 D) and can be found in https://epoc.cochrane.org/resources/epoc-resources-review-authors . The Critical Appraisal Skills Programme (CASP) is a part of the Oxford Centre for Triple Value Healthcare Ltd. (3 V) portfolio, which provides resources and learning and development opportunities to support the development of critical appraisal skills in the UK ( http://www.casp-uk.net/ ) [ 18 – 20 ]. The CASP checklist for RCT consists of three sections involving 11 items (Table S 1 E). The National Institutes of Health (NIH) also develops quality assessment tools for controlled intervention study (Table S 1 F) to assess methodological quality of RCT ( https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools ).
The Joanna Briggs Institute (JBI) is an independent, international, not-for-profit researching and development organization based in the Faculty of Health and Medical Sciences at the University of Adelaide, South Australia ( https://joannabriggs.org/ ). Hence, it also develops many critical appraisal checklists involving the feasibility, appropriateness, meaningfulness and effectiveness of healthcare interventions. Table S 1 G presents the JBI Critical appraisal checklist for RCT, which includes 13 items.
The Scottish Intercollegiate Guidelines Network (SIGN) was established in 1993 ( https://www.sign.ac.uk/ ). Its objective is to improve the quality of health care for patients in Scotland via reducing variations in practices and outcomes, through developing and disseminating national clinical guidelines containing recommendations for effective practice based on current evidence. Hence, it also develops many critical appraisal checklists for assessing methodological quality of different study types, including RCT (Table S 1 H).
In addition, the Jadad Scale [ 21 ], Modified Jadad Scale [ 22 , 23 ], Delphi List [ 24 ], Chalmers Scale [ 25 ], National Institute for Clinical Excellence (NICE) methodology checklist [ 11 ], Downs & Black checklist [ 26 ], and other tools summarized by West et al. in 2002 [ 27 ] are not commonly used or recommended nowadays.
Before starting clinical trials, the safety and effectiveness of new drugs are usually tested in animal models [ 28 ], so animal study is considered as preclinical research, possessing important significance [ 29 , 30 ]. Likewise, the methodological quality of animal study also needs to be assessed [ 30 ]. In 1999, the initial “Stroke Therapy Academic Industry Roundtable (STAIR)” recommended their criteria for assessing the quality of stroke animal studies [ 31 ] and this tool is also called “STAIR”. In 2009, the STAIR Group updated their criteria and developed “Recommendations for Ensuring Good Scientific Inquiry” [ 32 ]. Besides, Macleod et al. [ 33 ] proposed a 10-point tool based on STAIR to assess methodological quality of animal study in 2004, which is also called “CAMARADES (The Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies)”; with “S” presenting “Stroke” at that time and now standing for “Studies” ( http://www.camarades.info/ ). In CAMARADES tool, every item could reach a highest score of one point and the total score for this tool could achieve 10 points (Table S 1 J).
In 2008, the Systematic Review Center for Laboratory animal Experimentation (SYRCLE) was established in Netherlands and this team developed and released an RoB tool for animal intervention studies - SYRCLE’s RoB tool in 2014, based on the original Cochrane RoB Tool [ 34 ]. This new tool contained 10 items which had become the most recommended tool for assessing the methodological quality of animal intervention studies (Table S 1 I).
In clinical research, RCT is not always feasible [ 35 ]; therefore, non-randomized design remains considerable. In non-randomised study (also called quasi-experimental studies), investigators control the allocation of participants into groups, but do not attempt to adopt randomized operation [ 36 ], including follow-up study. According to with or without comparison, non-randomized clinical intervention study can be divided into comparative and non-comparative sub-types, the Risk Of Bias In Non-randomised Studies - of Interventions (ROBINS-I) tool [ 37 ] is the preferentially recommended tool. This tool is developed to evaluate risk of bias in estimating comparative effectiveness (harm or benefit) of interventions in studies not adopting randomization in allocating units (individuals or clusters of individuals) into comparison groups. Besides, the JBI critical appraisal checklist for quasi-experimental studies (non-randomized experimental studies) is also suitable, which includes 9 items. Moreover, the methodological index for non-randomized studies (MINORS) [ 38 ] tool can also be used, which contains a total of 12 methodological points; the first 8 items could be applied for both non-comparative and comparative studies, while the last 4 items appropriate for studies with two or more groups. Every item is scored from 0 to 2, and the total scores over 16 or 24 give an overall quality score. Table S 1 K-L-M presented the major items of these three tools.
Non-randomized study with a separate control group could also be called clinical controlled trial or controlled before-and-after study. For this design type, the EPOC RoB tool is suitable (see Table S 1 D). When using this tool, the “random sequence generation” and “allocation concealment” should be scored as “High risk”, while grading for other items could be the same as that for randomized trial.
Non-randomized study without a separate control group could be a before-after (Pre-Post) study, a case series (uncontrolled longitudinal study), or an interrupted time series study. A case series is described a series of individuals, who usually receive the same intervention, and contains non control group [ 9 ]. There are several tools for assessing the methodological quality of case series study. The latest one was developed by Moga C et al. [ 39 ] in 2012 using a modified Delphi technique, which was developed by the Canada Institute of Health Economics (IHE); hence, it is also called “IHE Quality Appraisal Tool” (Table S 1 N). Moreover, NIH also develops a quality assessment tool for case series study, including 9 items (Table S 1 O). For interrupted time series studies, the “EPOC RoB tool for interrupted time series studies” is recommended (Table S 1 P). For the before-after study, we recommend the NIH quality assessment tool for before-after (Pre-Post) study without control group (Table S 1 Q).
In addition, for non-randomized intervention study, the Reisch tool (Check List for Assessing Therapeutic Studies) [ 11 , 40 ], Downs & Black checklist [ 26 ], and other tools summarized by Deeks et al. [ 36 ] are not commonly used or recommended nowadays.
Observational studies include cohort study, case-control study, cross-sectional study, case series, case reports, and comparative effectiveness research [ 41 ], and can be divided into analytical and descriptive studies [ 42 ].
Cohort study includes prospective cohort study, retrospective cohort study, and ambidirectional cohort study [ 43 ]. There are some tools for assessing the quality of cohort study, such as the CASP cohort study checklist (Table S 2 A), SIGN critical appraisal checklists for cohort study (Table S 2 B), NIH quality assessment tool for observational cohort and cross-sectional studies (Table S 2 C), Newcastle-Ottawa Scale (NOS; Table S 2 D) for cohort study, and JBI critical appraisal checklist for cohort study (Table S 2 E). However, the Downs & Black checklist [ 26 ] and the NICE methodology checklist for cohort study [ 11 ] are not commonly used or recommended nowadays.
The NOS [ 44 , 45 ] came from an ongoing collaboration between the Universities of Newcastle, Australia and Ottawa, Canada. Among all above mentioned tools, the NOS is the most commonly used tool nowadays which also allows to be modified based on a special subject.
Case-control study selects participants based on the presence of a specific disease or condition, and seeks earlier exposures that may lead to the disease or outcome [ 42 ]. It has an advantage over cohort study, that is the issue of “drop out” or “loss in follow up” of participants as seen in cohort study would not arise in such study. Nowadays, there are some acceptable tools for assessing the methodological quality of case-control study, including CASP case-control study checklist (Table S 2 F), SIGN critical appraisal checklists for case-control study (Table S 2 G), NIH quality assessment tool of case-control study (Table S 2 H), JBI critical appraisal checklist for case-control study (Table S 2 I), and the NOS for case-control study (Table S 2 J). Among them, the NOS for case-control study is also the most frequently used tool nowadays and allows to be modified by users.
In addition, the Downs & Black checklist [ 26 ] and the NICE methodology checklist for case-control study [ 11 ] are also not commonly used or recommended nowadays.
Cross-sectional study is used to provide a snapshot of a disease and other variables in a defined population at a time point. It can be divided into analytical and purely descriptive types. Descriptive cross-sectional study merely describes the number of cases or events in a particular population at a time point or during a period of time; whereas analytic cross-sectional study can be used to infer relationships between a disease and other variables [ 46 ].
For assessing the quality of analytical cross-sectional study, the NIH quality assessment tool for observational cohort and cross-sectional studies (Table S 2 C), JBI critical appraisal checklist for analytical cross-sectional study (Table S 2 K), and the Appraisal tool for Cross-Sectional Studies (AXIS tool; Table S 2 L) [ 47 ] are recommended tools. The AXIS tool is a critical appraisal tool that addresses study design and reporting quality as well as the risk of bias in cross-sectional study, which was developed in 2016 and contains 20 items. Among these three tools, the JBI checklist is the most preferred one.
Purely descriptive cross-sectional study is usually used to measure disease prevalence and incidence. Hence, the critical appraisal tool for analytic cross-sectional study is not proper for the assessment. Only few quality assessment tools are suitable for descriptive cross-sectional study, like the JBI critical appraisal checklist for studies reporting prevalence data [ 48 ] (Table S 2 M), Agency for Healthcare Research and Quality (AHRQ) methodology checklist for assessing the quality of cross-sectional/ prevalence study (Table S 2 N), and Crombie’s items for assessing the quality of cross-sectional study [ 49 ] (Table S 2 O). Among them, the JBI tool is the newest.
Unlike above mentioned interventional case series, case reports and case series are used to report novel occurrences of a disease or a unique finding [ 50 ]. Hence, they belong to descriptive studies. There is only one tool – the JBI critical appraisal checklist for case reports (Table S 2 P).
Comparative effectiveness research (CER) compares real-world outcomes [ 51 ] resulting from alternative treatment options that are available for a given medical condition. Its key elements include the study of effectiveness (effect in the real world), rather than efficacy (ideal effect), and the comparisons among alternative strategies [ 52 ]. In 2010, the Good Research for Comparative Effectiveness (GRACE) Initiative was established and developed principles to help healthcare providers, researchers, journal readers, and editors evaluate inherent quality for observational research studies of comparative effectiveness [ 41 ]. And in 2016, a validated assessment tool – the GRACE Checklist v5.0 (Table S 2 Q) was released for assessing the quality of CER.
Diagnostic tests, also called “Diagnostic Test Accuracy (DTA)”, are used by clinicians to identify whether a condition exists in a patient or not, so as to develop an appropriate treatment plan [ 53 ]. DTA has several unique features in terms of its design which differ from standard intervention and observational evaluations. In 2003, Penny et al. [ 53 , 54 ] developed a tool for assessing the quality of DTA, namely Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool. In 2011, a revised “QUADAS-2” tool (Table S 2 R) was launched [ 55 , 56 ]. Besides, the CASP diagnostic checklist (Table S 2 S), SIGN critical appraisal checklists for diagnostic study (Table S 2 T), JBI critical appraisal checklist for diagnostic test accuracy studies (Table S 2 U), and the Cochrane risk of bias assessing tool for diagnostic test accuracy (Table S 2 V) are also common useful tools in this field.
Of them, the Cochrane risk of bias tool ( https://methods.cochrane.org/sdt/ ) is based on the QUADAS tool, and the SIGN and JBI tools are based on the QUADAS-2 tool. Of course, the QUADAS-2 tool is the first recommended tool. Other relevant tools reviewed by Whiting et al. [ 53 ] in 2004 are not used nowadays.
Health economic evaluation.
Health economic evaluation research comparatively analyses alternative interventions with regard to their resource uses, costs and health effects [ 57 ]. It focuses on identifying, measuring, valuing and comparing resource use, costs and benefit/effect consequences for two or more alternative intervention options [ 58 ]. Nowadays, health economic study is increasingly popular. Of course, its methodological quality also needs to be assessed before its initiation. The first tool for such assessment was developed by Drummond and Jefferson in 1996 [ 59 ], and then many tools have been developed based on the Drummond’s items or its revision [ 60 ], such as the SIGN critical appraisal checklists for economic evaluations (Table S 3 A), CASP economic evaluation checklist (Table S 3 B), and the JBI critical appraisal checklist for economic evaluations (Table S 3 C). The NICE only retains one methodology checklist for economic evaluation (Table S 3 D).
However, we regard the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement [ 61 ] as a reporting tool rather than a methodological quality assessment tool, so we do not recommend it to assess the methodological quality of health economic evaluation.
In healthcare, qualitative research aims to understand and interpret individual experiences, behaviours, interactions, and social contexts, so as to explain interested phenomena, such as the attitudes, beliefs, and perspectives of patients and clinicians; the interpersonal nature of caregiver and patient relationships; illness experience; and the impact of human sufferings [ 62 ]. Compared with quantitative studies, assessment tools for qualitative studies are fewer. Nowadays, the CASP qualitative research checklist (Table S 3 E) is the most frequently recommended tool for this issue. Besides, the JBI critical appraisal checklist for qualitative research [ 63 , 64 ] (Table S 3 F) and the Quality Framework: Cabinet Office checklist for social research [ 65 ] (Table S 3 G) are also suitable.
Clinical prediction study includes predictor finding (prognostic factor) studies, prediction model studies (development, validation, and extending or updating), and prediction model impact studies [ 66 ]. For predictor finding study, the Quality In Prognosis Studies (QIPS) tool [ 67 ] can be used for assessing its methodological quality (Table S 3 H). For prediction model impact studies, if it uses a randomized comparative design, tools for RCT can be used, especially the RoB 2.0 tool; if it uses a nonrandomized comparative design, tools for non-randomized studies can be used, especially the ROBINS-I tool. For diagnostic and prognostic prediction model studies, the Prediction model Risk Of Bias Assessment Tool (PROBAST; Table S 3 I) [ 68 ] and CASP clinical prediction rule checklist (Table S 3 J) are suitable.
Text and expert opinion-based evidence (also called “non-research evidence”) comes from expert opinions, consensus, current discourse, comments, and assumptions or assertions that appear in various journals, magazines, monographs and reports [ 69 – 71 ]. Nowadays, only the JBI has a critical appraisal checklist for the assessment of text and expert opinion papers (Table S 3 K).
An outcome measurement instrument is a “device” used to collect a measurement. The range embraced by the term ‘instrument’ is broad, and can refer to questionnaire (e.g. patient-reported outcome such as quality of life), observation (e.g. the result of a clinical examination), scale (e.g. a visual analogue scale), laboratory test (e.g. blood test) and images (e.g. ultrasound or other medical imaging) [ 72 , 73 ]. Measurements can be subjective or objective, and either unidimensional (e.g. attitude) or multidimensional. Nowadays, only one tool - the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) Risk of Bias checklist [ 74 – 76 ] ( www.cosmin.nl/ ) is proper for assessing the methodological quality of outcome measurement instrument, and Table S 3 L presents its major items, including patient - reported outcome measure (PROM) development (Table S 3 LA), content validity (Table S 3 LB), structural validity (Table S 3 LC), internal consistency (Table S 3 LD), cross-cultural validity/ measurement invariance (Table S 3 LE), reliability (Table S 3 LF), measurement error (Table S 3 LG), criterion validity (Table S 3 LH), hypotheses testing for construct validity (Table S 3 LI), and responsiveness (Table S 3 LJ).
Systematic review and meta-analysis.
Systematic review and meta-analysis are popular methods to keep up with current medical literature [ 4 – 6 ]. Their ultimate purposes and values lie in promoting healthcare [ 6 , 77 , 78 ]. Meta-analysis is a statistical process of combining results from several studies, commonly a part of a systematic review [ 11 ]. Of course, critical appraisal would be necessary before using systematic review and meta-analysis.
In 1988, Sacks et al. developed the first tool for assessing the quality of meta-analysis on RCTs - the Sack’s Quality Assessment Checklist (SQAC) [ 79 ]; And then in 1991, Oxman and Guyatt developed another tool – the Overview Quality Assessment Questionnaire (OQAQ) [ 80 , 81 ]. To overcome the shortcomings of these two tools, in 2007 the A Measurement Tool to Assess Systematic Reviews (AMSTAR) was developed based on them [ 82 ] ( http://www.amstar.ca/ ). However, this original AMSTAR instrument did not include an assessment on the risk of bias for non-randomised studies, and the expert group thought revisions should address all aspects of the conduct of a systematic review. Hence, the new instrument for randomised or non-randomised studies on healthcare interventions - AMSTAR 2 was released in 2017 [ 83 ], and Table S 4 A presents its major items.
Besides, the CASP systematic review checklist (Table S 4 B), SIGN critical appraisal checklists for systematic reviews and meta-analyses (Table S 4 C), JBI critical appraisal checklist for systematic reviews and research syntheses (Table S 4 D), NIH quality assessment tool for systematic reviews and meta-analyses (Table S 4 E), The Decision Support Unit (DSU) network meta-analysis (NMA) methodology checklist (Table S 4 F), and the Risk of Bias in Systematic Review (ROBIS) [ 84 ] tool (Table S 4 G) are all suitable. Among them, the AMSTAR 2 is the most commonly used and the ROIBS is the most frequently recommended.
Among those tools, the AMSTAR 2 is suitable for assessing systematic review and meta-analysis based on randomised or non-randomised interventional studies, the DSU NMA methodology checklist for network meta-analysis, while the ROBIS for meta-analysis based on interventional, diagnostic test accuracy, clinical prediction, and prognostic studies.
Clinical practice guideline (CPG) is integrated well into the thinking of practicing clinicians and professional clinical organizations [ 85 – 87 ]; and also make scientific evidence incorporated into clinical practice [ 88 ]. However, not all CPGs are evidence-based [ 89 , 90 ] and their qualities are uneven [ 91 – 93 ]. Until now there were more than 20 appraisal tools have been developed [ 94 ]. Among them, the Appraisal of Guidelines for Research and Evaluation (AGREE) instrument has the greatest potential in serving as a basis to develop an appraisal tool for clinical pathways [ 94 ]. The AGREE instrument was first released in 2003 [ 95 ] and updated to AGREE II instrument in 2009 [ 96 ] ( www.agreetrust.org/ ). Now the AGREE II instrument is the most recommended tool for CPG (Table S 4 H).
Besides, based on the AGREE II, the AGREE Global Rating Scale (AGREE GRS) Instrument [ 97 ] was developed as a short item tool to evaluate the quality and reporting of CPGs.
Currently, the EBM is widely accepted and the major attention of healthcare workers lies in “Going from evidence to recommendations” [ 98 , 99 ]. Hence, critical appraisal of evidence before using is a key point in this process [ 100 , 101 ]. In 1987, Mulrow CD [ 102 ] pointed out that medical reviews needed routinely use scientific methods to identify, assess, and synthesize information. Hence, perform methodological quality assessment is necessary before using the study. However, although there are more than 20 years have been passed since the first tool emergence, many users remain misunderstand the methodological quality and reporting quality. Of them, someone used the reporting checklist to assess the methodological quality, such as used the Consolidated Standards of Reporting Trials (CONSORT) statement [ 103 ] to assess methodological quality of RCT, used the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement [ 104 ] to methodological quality of cohort study. This phenomenon indicates more universal education of clinical epidemiology is needed for medical students and professionals.
The methodological quality tool development should according to the characteristics of different study types. In this review, we used “methodological quality”, “risk of bias”, “critical appraisal”, “checklist”, “scale”, “items”, and “assessment tool” to search in the NICE website, SIGN website, Cochrane Library website and JBI website, and on the basis of them, added “systematic review”, “meta-analysis”, “overview” and “clinical practice guideline” to search in PubMed. Compared with our previous systematic review [ 11 ], we found some tools are recommended and remain used, some are used without recommendation, and some are eliminated [ 10 , 29 , 30 , 36 , 53 , 94 , 105 – 107 ]. These tools produce a significant impetus for clinical practice [ 108 , 109 ].
In addition, compared with our previous systematic review [ 11 ], this review stated more tools, especially those developed after 2014, and the latest revisions. Of course, we also adjusted the method of study type classification. Firstly, in 2014, the NICE provided 7 methodology checklists but only retains and updated the checklist for economic evaluation now. Besides, the Cochrane RoB 2.0 tool, AMSTAR 2 tool, CASP checklist, and most of JBI critical appraisal checklists are all the newest revisions; the NIH quality assessment tool, ROBINS-I tool, EPOC RoB tool, AXIS tool, GRACE Checklist, PROBAST, COSMIN Risk of Bias checklist, and ROBIS tool are all newly released tools. Secondly, we also introduced tools for network meta-analysis, outcome measurement instruments, text and expert opinion papers, prediction studies, qualitative study, health economic evaluation, and CER. Thirdly, we classified interventional studies into randomized and non-randomized sub-types, and then further classified non-randomized studies into with and without controlled group. Moreover, we also classified cross-sectional study into analytic and purely descriptive sub-types, and case-series into interventional and observational sub-types. These processing courses were more objective and comprehensive.
Obviously, the number of appropriate tools is the largest for RCT, followed by cohort study; the applicable range of JBI is widest [ 63 , 64 ], with CASP following closely. However, further efforts remain necessary to develop appraisal tools. For some study types, only one assessment tool is suitable, such as CER, outcome measurement instruments, text and expert opinion papers, case report, and CPG. Besides, there is no proper assessment tool for many study types, such as overview, genetic association study, and cell study. Moreover, existing tools have not been fully accepted. In the future, how to develop well accepted tools remains a significant and important work [ 11 ].
Our review can help the professionals of systematic review, meta-analysis, guidelines, and evidence users to choose the best tool when producing or using evidence. Moreover, methodologists can obtain the research topics for developing new tools. Most importantly, we must remember that all assessment tools are subjective, and actual yields of wielding them would be influenced by user’s skills and knowledge level. Therefore, users must receive formal training (relevant epidemiological knowledge is necessary), and hold rigorous academic attitude, and at least two independent reviewers should be involved in evaluation and cross-checking to avoid performance bias [ 110 ].
Acknowledgements.
The authors thank all the authors and technicians for their hard field work for development methodological quality assessment tools.
AGREE GRS | AGREE Global rating scale |
AGREE | Appraisal of guidelines for research and evaluation |
AHRQ | Agency for healthcare research and quality |
AMSTAR | A measurement tool to assess systematic reviews |
AXIS | Appraisal tool for cross-sectional studies |
CAMARADES | The collaborative approach to meta-analysis and review of animal data from experimental studies |
CASP | Critical appraisal skills programme |
CER | Comparative effectiveness research |
CHEERS | Consolidated health economic evaluation reporting standards |
CONSORT | Consolidated standards of reporting trials |
COSMIN | Consensus-based standards for the selection of health measurement instruments |
CPG | Clinical practice guideline |
DSU | Decision support unit |
DTA | Diagnostic test accuracy |
EBM | Evidence-based medicine |
EPOC | The effective practice and organisation of care group |
GRACE | The good research for comparative effectiveness initiative |
IHE | Canada institute of health economics |
IOM | Institute of medicine |
JBI | Joanna Briggs Institute |
MINORS | Methodological index for non-randomized studies |
NICE | National institute for clinical excellence |
NIH | National institutes of health |
NMA | Network meta-analysis |
NOS | Newcastle-Ottawa scale |
OQAQ | Overview quality assessment questionnaire |
PEDro | Physiotherapy evidence database |
PROBAST | The prediction model risk of bias assessment tool |
PROM | Patient - reported outcome measure |
QIPS | Quality in prognosis studies |
QUADAS | Quality assessment of diagnostic accuracy studies |
RCT | Randomized controlled trial |
RoB | Risk of bias |
ROBINS-I | Risk of bias in non-randomised studies - of interventions |
ROBIS | Risk of bias in systematic review |
SIGN | The Scottish intercollegiate guidelines network |
SQAC | Sack’s quality assessment checklist |
STAIR | Stroke therapy academic industry roundtable |
STROBE | Strengthening the reporting of observational studies in epidemiology |
SYRCLE | Systematic review center for laboratory animal experimentation |
XTZ is responsible for the design of the study and review of the manuscript; LLM, ZHY, YYW, and DH contributed to the data collection; LLM, YYW, and HW contributed to the preparation of the article. All authors read and approved the final manuscript.
This work was supported (in part) by the Entrusted Project of National commission on health and health of China (No. [2019]099), the National Key Research and Development Plan of China (2016YFC0106300), and the Nature Science Foundation of Hubei Province (2019FFB03902). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors declare that there are no conflicts of interest in this study.
Ethics approval and consent to participate.
Not applicable.
Competing interests.
The authors declare that they have no competing interests.
Lin-Lu Ma, Email: moc.361@58251689531 .
Yun-Yun Wang, Email: moc.361@49072054531 .
Zhi-Hua Yang, Email: moc.621@xxauhihzgnay .
Di Huang, Email: moc.361@74384236131 .
Hong Weng, Email: moc.361@29hgnew .
Xian-Tao Zeng, Email: moc.361@8211oatnaixgnez , Email: moc.mtbecuhw@oatnaixgnez .
Supplementary information accompanies this paper at 10.1186/s40779-020-00238-8.
Qualitative research has an important place within evidence-based health care (EBHC), contributing to policy on patient safety and quality of care, supporting understanding of the impact of chronic illness, and explaining contextual factors surrounding the implementation of interventions. However, the question of whether, when and how to critically appraise qualitative research persists. Whilst there is consensus that we cannot - and should not – simplistically adopt existing approaches for appraising quantitative methods, it is nonetheless crucial that we develop a better understanding of how to subject qualitative evidence to robust and systematic scrutiny in order to assess its trustworthiness and credibility. Currently, most appraisal methods and tools for qualitative health research use one of two approaches: checklists or frameworks. We have previously outlined the specific issues with these approaches (Williams et al 2019). A fundamental challenge still to be addressed, however, is the lack of differentiation between different methodological approaches when appraising qualitative health research. We do this routinely when appraising quantitative research: we have specific checklists and tools to appraise randomised controlled trials, diagnostic studies, observational studies and so on. Current checklists for qualitative research typically treat the entire paradigm as a single design (illustrated by titles of tools such as ‘CASP Qualitative Checklist’, ‘JBI checklist for qualitative research’) and frameworks tend to require substantial understanding of a given methodological approach without providing guidance on how they should be applied. Given the fundamental differences in the aims and outcomes of different methodologies, such as ethnography, grounded theory, and phenomenological approaches, as well as specific aspects of the research process, such as sampling, data collection and analysis, we cannot treat qualitative research as a single approach. Rather, we must strive to recognise core commonalities relating to rigour, but considering key methodological differences. We have argued for a reconsideration of current approaches to the systematic appraisal of qualitative health research (Williams et al 2021), and propose the development of a tool or tools that allow differentiated evaluations of multiple methodological approaches rather than continuing to treat qualitative health research as a single, unified method. Here we propose a workshop for researchers interested in the appraisal of qualitative health research and invite them to develop an initial consensus regarding core aspects of a new appraisal tool that differentiates between the different qualitative research methodologies and thus provides a ‘fit for purpose’ tool, for both, educators and clinicians.
https://doi.org/10.1136/ebm-2022-EBMLive.36
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
IMAGES
VIDEO
COMMENTS
To determine the quality of included studies they were assesses utilizing the JBI online appraisal tools for qualitative studies and cross-sectional studies (Joanna Briggs Institute [JBI], ... (JBI). (2020). Checklist for systematic reviews and research synthesis: Critical appraisal tools for use in JBI systematic reviews.
The screening process for all citations, full-text articles, and abstract data will be carried out by two reviewers independently. In case of any potential conflicts, they will be resolved through discussion. The quality of quantitative studies will be reviewed using the JBI critical appraisal checklist for analytical cross-sectional studies.
Ethical approval by an appropriate body. A statement on the ethical approval process followed should be in the report. 10. Relationship of conclusions to analysis, or interpretation of the data. This criterion concerns the relationship between the findings reported and the views or words of study participants.
JBI Critical Appraisal Checklist for Qualitative Research Reviewer Date Author Year Record Number Yes No Unclear Not applicable 1. Is there congruity between the stated philosophical ... Joanna Briggs Institute 2017 Critical Appraisal Checklist for Qualitative Research Author: JBI Martin Created Date: 7/11/2017 12:50:05 PM ...
JBI's Evidence Synthesis Critical Appraisal Tools Assist in Assessing the Trustworthiness, ... Stephenson M, Aromataris E. Methodological quality of case series studies: an introduction to the JBI critical appraisal tool. JBI Evidence Synthesis. 2020;18(10):2127-2133 ... Checklist for Qualitative Research. How to cite. COPY.
theories, methodologies and rigorous processes for the critical appraisal and synthesis of these diverse forms of evidence in order to aid in clinical decision-making in health care. There now exists JBI guidance for conducting reviews of effectiveness research, qualitative research,
The systematic review is essentially an analysis of the available literature (that is, evidence) and a. judgment of the effectiveness or otherwise of a practice, involving a series of complex steps. JBI takes a. particular view on what counts as evidence and the methods utilised to synthesise those different types of. evidence.
5. Is there congruity between the research methodology and the interpretation of results? 6. Is there a statement locating the researcher culturally or theoretically? 7. Is the influence of the researcher on the research, and vice- versa, addressed? 8. Are participants, and their voices, adequately represented? 9. Is the research ethical ...
JBI grants use of these Critical Appraisal Checklist for Randomized Controlled Trials -5. tools for research purposes only. All other enquiries should be sent to [email protected]. be attributed to the potential cause (the examined intervention or treatment), as maybe it is plausible that the effect may be explained by the ...
JBI guidance for conducting reviews of effectiveness research, qualitative research, prevalence/incidence, etiology/risk, economic evaluations, text/opinion, diagnostic test accuracy, mixed-methods, umbrella reviews and scoping reviews.
Previous iterations of the JBI critical appraisal tool for RCTs intuitively supported reviewers assessing the overall quality of a study using a checklist-based or ... Tufanaru C, Stern C, McArthur A, et al. Methodological quality of case series studies: an introduction to the JBI critical appraisal tool. JBI Evid Synth 2020;18(10):2127-33. ...
Below are some additional Frequently Asked Questions about the Qualitative Research Checklist that have been ... Please review these carefully as you conduct critical appraisal using JBI tools. Articles on Qualitative Research Design & Methodology ... L., McCabe, C., Keogh, B., Brady, A., & McCann, M. (2020). An overview of the qualitative ...
JBI-QARI is commonly used to assess the strengths and limitations of qualitative studies and consists of 10 items, all of which were rated as 'yes', 'no', 'unclear' and 'not applicable'. The ...
The systematic review is essentially an analysis of the available literature (that is, evidence) and a. judgment of the effectiveness or otherwise of a practice, involving a series of complex steps. JBI takes a. particular view on what counts as evidence and the methods utilised to synthesise those different types of. evidence.
Includes critical appraisal checklists for key study designs; glossary of key research terms; key links related to evidence based healthcare, statistics, and research; a bibliography of articles and research papers about CASP and other critical appraisal tools and approaches 1993-2012. ... Critical Appraisal Checklist for Qualitative Research ...
Results: The JBI critical appraisal tool for case series studies includes 10 questions addressing the internal validity and risk of bias of case series designs, particularly confounding, selection, and information bias, in addition to the importance of clear reporting. Conclusion: In certain situations, case series designs may represent the ...
5.Is there congruity between the research methodology and the interpretation of results? 6.Is there a statement locating the researcher culturally or theoretically? 7.Is the influence of the researcher on the research, and vice-versa, addressed? 8.Are participants, and their voices, adequately represented?
Te methodological quality of the selected 19 studies relevant to the inclusion criteria was assessed using Te Joanna Briggs Institute (JBI) Critical Appraisal tools; a checklist for analytical ...
JBI Critical Appraisal Checklist for Case Series ... JBI, 2020. Available from ... 'A case series (also known as a clinical series) is a type of medical research study that tracks subjects with a known exposure, such as patients who have received a similar treatment, or examines their medical records for exposure and outcome.' Wikipedia ...
Consolidate Criteria for Reporting Qualitative Research (COREQ)24 checklist, which was designed to provide standards for authors when reporting qualitative research but is often mistaken for a methods appraisal tool.10 Broadly speaking there are two types of crit-ical appraisal approaches for qualitative research: checklists and frameworks.
Nowadays, the CASP qualitative research checklist (Table S3E) is the most frequently recommended tool for this issue. Besides, the JBI critical appraisal checklist for qualitative research [63, 64] (Table S3F) and the Quality Framework: Cabinet Office checklist for social research (Table S3G) are also suitable.
Qualitative research has an important place within evidence-based health care (EBHC), contributing to policy on patient safety and quality of care, supporting understanding of the impact of chronic illness, and explaining contextual factors surrounding the implementation of interventions. However, the question of whether, when and how to critically appraise qualitative research persists ...