Meta-Analysis Resources

Tools for Those Who Summarize the Evidence Base

Resources and networking for those who conduct or interpret meta-analyses related to any phenomenon that is gauged in multiple studies.

combining means for multiple outcomes - one group


I am looking for a formula to combine multiple outcomes for one group. For example, some of the studies I'm including in my analysis give separate results for each subscale of the main outcome (e.g., aggression) for each group, and I need to combine their Means and Standard Deviations so that I get one outcome for each group.

I have the formula for combining subgroups into one group (SStotal/Sum of n-1) but haven't found anything for combining means and SDs for one grp.

Thank you very much!

Views: 913

Reply to This

Replies to This Discussion

Because different subscales routinely have different SDs, the safest thing is to calculate a standardized mean difference effect size for each subscale and then average afterwards to form your aggregate.

Some would argue that you could correct the average effect size with their effective reliability, but I am a little leary of doing so (because it probably cannot be applied uniformly across the studies in a given meta-analysis).

Hope that helps!

I recently responded to a similar question posted to the SEMNET email discussion list (  The gist of that response was that correlations between subscales should probably be used to obtain any type of "composite" result to be meta-analyzed: Most statistically defensible approaches require these correlations for either the effect size or its sampling\conditional variance.  Beyond that, I'll offer a few additional thoughts to supplement Blair's advice.

First, it's unclear how the means and SDs will be used in the meta-analysis; explaining a bit more about this might help potential responders address your specific situation more accurately.  For instance, do you intend to meta-analyze each group's means or SDs directly, in their original metric?  That's relatively rare but certainly feasible when all studies' outcome variables are measured the same way.  I suspect that you instead plan to meta-analyze some sort of (possibly standardized) mean difference between two groups; if you have M subscales, then you'll need their M(M - 1) / 2 correlations for each group.

Second, Blair's suggestion to aggregate standardized mean differences (SMDs) is a common strategy for handling such "multiple-endpoint" effect sizes, which are discussed in varying degrees of detail in the following chapters and article:

  • Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Multiple outcomes or time-points within a study. In M. Borenstein, L. V. Hedges, J. P. T. Higgins, & H. R. Rothstein, Introduction to meta-analysis (pp. 225-238). Chichester, UK: John Wiley & Sons.
  • Gleser, L. J., & Olkin, I. (2009). Stochastically dependent effect sizes. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 357-376). New York: Russell Sage Foundation.
  • Timm, N. H. (1999). Testing multivariate effect sizes in multiple-endpoint studies. Multivariate Behavioral Research, 34, 457-465. doi:10.1207/S15327906MBR3404_3

Third, one complication to consider when aggregating multiple-endpoint SMDs from multiple subscales is that there's more than one way to compute an "average" or "composite" SMD, depending essentially on how each subscale is weighted.  A simple (i.e., unweighted) mean might be sensible, but in some situations other approaches might yield a composite SMD with better statistical properties (e.g., an optimally weighted mean whose sampling variance is as small as possible -- essentially based on a within-study meta-analysis of correlated SMDs).  Whether these latter approaches are worth the (usually) extra effort depends on issues I won't delve into here (e.g., rationale for using composite, psychometric properties of subscales, relations among subscales, computational and other resources).

Finally, care is required to compute an appropriate sampling variance for the composite SMD.  How this variance is computed depends on how the composite SMD was computed; it'll be different for a simple mean of the subscales' SMDs than for another weighting scheme.  The formula will probably involve matrix computations (e.g., Gleser & Olkin, 2009), though this could be re-expressed using several summations and multiplications.  This is one place where correlations among subscales are needed, besides for some versions of the composite SMD itself; as you might imagine, this variance would be rather different if all correlations between subscales were 1.0 versus 0.0 or had other more realistic correlation-matrix patterns.  I've often seen meta-analysts use unjustifiable strategies to obtain this variance (e.g., averaging the constituent SMDs' variances) -- sometimes with badly distorted results.

Sorry if that seems complicated.  For better or for worse, this is one of many meta-analysis issues that might seem simple at first glance but actually involves some tricky issues.


I am having similar difficulties re this issue. I would like to combine the 3 sub scales of the Impact of Event Scale-Revised to calculate a total score. I have been told that I can simply work out the sum of standard variances by squaring the standard deviations and then divide by 3, and use square root to obtain total SD. Would this be a method I could use?

I have also looked at studies to find the correlations between the subscales and these vary. 

Feeling a bit stuck with and overwhelmed by the number of methods and associated caveats so any advise would be most appreciated. Thanks,


Reply to Discussion


© 2017   Created by Blair T. Johnson.   Powered by

Badges  |  Report an Issue  |  Terms of Service