Tools for Those Who Summarize the Evidence Base
For a few years now, I have been doing some work related to the methodological quality (MQ) of meta-analyses and of the studies they integrate, which have both shown an increasing presence in scientific literature. As Rob Low, Hayley MacDonald and I argue in a paper now in press at Psychology & Health, presumably, more trustworthy results and clearer knowledge are reached in studies and reviews that use stronger methods. In theory, such results should be less subject to bias and to threats to validity. Such biases and threats have been known for half a century and longer in what are now often classic scholarly works (e.g., Campbell & Fiske, 1959; Cook & Campbell, 1976; Shadish, Cook, & Campbell, 2002). The perception that MQ is important to control at high levels (Shadish, 1989) has naturally telescoped into numerous scales of MQ, such that a comprehensive survey focused on non-randomized intervention studies found nearly 200 scales (Deeks et al., 2003).
In our article, Rob, Hayley and I reviewed meta-analyses in three domains of mutual interest: behavioral interventions to reduce risk of acquiring or transmitting HIV, exercise to improve mental health, and exercise to reduce blood pressure (BP). Each domain increasingly used MQ dimensions and/or scales (see Figure):
As you can see, the mental health domain showed quicker adoption than the other two, and HIV prevention still lags behind somewhat. Meta-analyses in these literature generally use MQ only qualitatively. Rob, Hayley and I conclude that, in literatures that are large enough and that exhibit significant heterogeneity, a wiser strategy is to examine MQ dimensions in interactive models that examine substantive predictors in combination with MQ dimensions. Such models are like systematic sensitivity analyses to make certain that critical moderator results do not depend on the inclusion of studies with poorer methods. Sure, we expect results from randomized controlled trials (RCTs) to yield better causal evidence about a treatment, but that does not mean that such trials necessarily yield the best evidence about all factors involved in making effects larger or smaller. We also provide a technical summary explaining how to conduct such analyses, which amounts to an application of simple slopes analysis to meta-regression. We follow the moving constant technique (Johnson & Huedo-Medina, 2011) as a simplification. (Otherwise, it would be necessary to use matrix algebra, something that in our experience, applied statisticians are not happy to endure.) Once the article is available on Research Gate, I will update the link here to direct you to it. In the meantime, feel free to write me to request a preprint (as with other papers cited here and from the Systematic Health Action Research Program [SHARP]; follow our Twitter feed here).
Systematic scales of MQ have also been developed in relation to the quality of meta-analyses themselves. With the profusion of reviews, meta-reviews are becoming increasingly popular, and even my SHARP colleagues and I have been writing them, incorporating MQ scales in relation to these meta-analyses. Perhaps the earliest developed is the overview quality assessment questionnaire (OQAQ; Oxman & Guyatt, 1991) and, perhaps most popularly, the AMSTAR (Shea et al., 2007), based in part on the OQAQ. The AMSTAR is short for “A MeaSurement Tool to Assess the methodological quality of systematic Reviews” (I think we can forgive them the clumsy set of words given the coolness of the acronym itself, can’t we?). In a paper now in press at AIDS and Behavior, Cleo Protogerou and I used the OQAQ to evaluate 11 systematic reviews that examined behavioral interventions to reduce HIV risk in adolescents. We then coupled these ratings with the degree of support for conclusions about what strategies most reliably decrease risk. Although there were some 30 dimensions that reviewers have implicated as underlying risk reduction, these reduce to 16 if one considers only the strongest reviews’ conclusions. I found this meta-review particularly insightful in relation to methodological practice, both of others and of our own SHARP reviews. For example, in 2011 our team published what was the largest meta-analysis in this domain, with 98 behavioral interventions with 51,240 adolescents, aged 11–19 years old. We also pride ourselves with having high MQ. Yet, our meta-analysis only scored 6 out of 7 total points on the OQAQ. Moreover, the volume of trials gauged across all 11 reviews is much larger, suggesting our meta-analysis had missed trials. Similarly, our review examined only a fraction of the dimensions that previous reviewers had suggested are linked with reduced risk for HIV (and related sexually transmitted infections). I always advocate to my students to read prior reviews carefully. Now I need to adopt my own advice!
With an interdisciplinary group of scholars, published in February in the Journal of Hypertension, we also applied an augmented version of the AMSTAR to meta-analyses that have examined the BP response to exercise. We found that meta-analyses in this domain have employed increasingly better methods over time, but that no extant meta-analysis has satisfied all AMSTAR MQ dimensions. We also documented that meta-analyses have inconsistently documented how beneficial exercise is to reduce BP; moderator dimensions sometimes appear in reversed patterns across meta-analyses. Such work helps to identify MQ flaws that might be responsible for the inconsistencies in findings. We also found that, to date, the scholarly impact of meta-analyses has been unrelated to their scored MQ. This result suggests that reviewers and consumers of meta-analyses have a poor sense of what exactly makes a high MQ meta-analysis. Reviewing the items of the AMSTAR would seem mandatory reading for anyone who wants to do systematic reviewing well. (And don't miss the PRISMA guidelines!) One emphasis of the AMSTAR is duplication of effort for key steps of the process, such as selection of studies and data extraction. Indeed, meta-analyses that were at least co-authored scored better on the AMSTAR than those that were solo authored.
Another logical extension of the philosophy that MQ matters is the increasing profusion of meta-analyses and systematic reviews that sample only what are thought to be the strongest studies and omit the others. Such reviews are often called best-evidence syntheses (Slavin, 1986), and it is the default strategy of research syntheses produced by the Cochrane Collaboration. The problem is that, as Valentine (2009) most clearly concluded, MQ scales and most MQ dimensions have at best a murky connection to the results that studies produce. It is known that lower reliability and validity, for example, will decrease observed effect sizes, on average (Hunter & Schmidt, 2004; Johnson & Eagly, 2014). It is not systematically known whether other dimensions routinely increase or decrease the size of observed effects. Therefore, Rob, Hayley and I concluded that meta-analyses should loosen selection criteria to permit studies of presumed lower quality to qualify. As we put it:
Siding with those who wish to treat it as an empirical question (e.g., Lipsey & Wilson, 2001; Valentine, 2009), we believe it is better to determine whether these studies presumed to have lower quality actually do provide different results than to exclude them with pure prejudice.
Following that advice need not necessitate including all trials, such as case-controls merged with RCTs, which might truly make things messy. But it might entail including both RCTs and uncontrolled trials, and making explicit comparisons between results to make sure that the design feature is not undermining findings. Rob, Hayley and I describe findings from several recent SHARP meta-analyses that use this approach, in two domains, HIV prevention, and exercise and mental health. (To date, it has not been used in meta-analyses of exercise and BP.) For example, in a PLOS ONE meta-analysis, Brown et al. (2012) found a dose-response pattern such that higher minutes-per-week of aerobic exercise was linked to greater reductions of depression for cancer survivors, but this pattern appeared in higher- but not lower-quality studies. As a general trend, it appears that higher quality studies generally do have more signal and less noise.
More systematic exploration of the influence that MQ dimensions play in combination with substantive results offers the promise of improved knowledge about the role of MQ. And it might help scholars to think theoretically how MQ should be related (or not related) to study results. It may also let us people who like to model big databases to have even greater statistical power and greater precision. See our paper for a more lengthy discussion of the strengths and weaknesses of examining the interactive approach to MQ in meta-analysis. Although we introduce it in relation to health promotion meta-analyses, the issues we raise generalize to other domains.
Brown, J. C., Huedo-Medina, T. B., Pestacello, L. S., Pestacello, S. M., Ferrer, R. A., LaCroix, J. M., & Johnson, B. T. (2012). The efficacy of exercise in reducing depressive symptoms among cancer survivors: A meta-analysis. PLoS ONE 7(1): e30955. (doi:10.1371/journal.pone.0030955)
Cook, T. D., & Campbell, D. T. (1976). The design and conduct of quasi-experiments and true experiments in field settings. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 223-326). Chicago: Rand McNally.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Deeks, J. J., Dinnes, J., D’amico, R., Sowden, A., Sakarovitch, C., Song, F., . . . Altman, D. (2003). Evaluating non-randomised intervention studies. Health Technology Assessment, 7(27), 1-179.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: Sage.
Johnson, B. T., & Eagly, A. H. (2014). Meta-analysis of social-personality psychological research. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (2nd Ed., pp. 675-707). London: Cambridge University Press.
Johnson, B. T., & Huedo-Medina, T. B. (2011). Depicting estimates using the intercept in meta-regression models: The moving constant technique. Research Synthesis Methods, 2, 204–220. (doi: 10.1002/jrsm.49)
Johnson, B. T., Low, R. E., & MacDonald, H. V. (in press). Panning for the gold in health research: Incorporating studies’ methodological quality in meta-analysis. Psychology & Health.
Johnson, B. T., MacDonald, H. V., Bruneau, M. L., Jr., Goldsby, T. U., Brown, J. C., Huedo-Medina, T. B., & Pescatello, L. S. (2014). Methodological quality of meta-analyses on the blood pressure response to exercise: A review. Journal of Hypertension, 32(4), 706-723. (doi:10.1097/HJH.0000000000000097)
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis (Vol. 49). Thousand Oaks, CA: Sage Publications.
Oxman A.D., Guyatt, G. H. (1991). Validation of an index of the quality of review articles. Journal of Clinical Epidemiology, 44:1271–1278.
Protogerou, C., & Johnson, B. T. (in press). Factors underlying the success of behavioral HIV-prevention interventions for adolescents: A meta-review. AIDS and Behavior.
Shadish, W. R. (1989). The perception and evaluation of quality in science. In B. Gholson, W. R. Shadish, R. A. Neimeyer & A. C. Houts (Eds.), Psychology of Science: Contributions to Metascience (pp. 383-426). New York: Cambridge University Press.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA, US: Houghton, Mifflin and Company.
Shea, B. J., Grimshaw, J. M., Wells, G. A., Boers, M., Andersson, N., Hamel, C., . . . Bouter, L. M. (2007). Development of AMSTAR: A measurement tool to assess the methodological quality of systematic reviews. BMC Medical Research Methodology, 7(1), 10.
Slavin, R. E. (1986). Best-evidence synthesis: An alternative to meta-analytic and traditional reviews. Educational Researcher, 15(9), 5-11.
Valentine, J. C. (2009). Judging the quality of primary research. In H. Cooper, L. V. Hedges & J. C. Valentine (Eds.), Handbook of Research Synthesis and Meta-Analysis (2nd ed.). New York: Russell Sage Foundation.