Tools for Those Who Summarize the Evidence Base
If you are reading this blog, you like numbers. Perhaps you even love them. Even if you are a dedicated qualitative researcher, I’d urge you to continue reading; this blog might be even more valuable to you. And if you are a quant geek, you might gain more of an appreciation for qualitative perspectives. Moreover, the issues I raise are relevant not only to the context of meta-analysis but also to other data-analytic strategies. It is relevant not only to scientists themselves but also those who consume their results. It’s extremely relevant to those of us who are launching new studies. Perhaps it is most relevant to journalists and policy makers because they help to set trends and conventions. And don’t worry, I’ll use a dash of humor or two as we proceed.
These days, the internet seems awash with “big data” or just “data” without the “big.” The same conclusion applies to media more broadly. I do not want to sidetrack this entry into the problems of treating word “data” as singular (when it is in fact plural). Heck, even I think the sentence “Big data are everywhere” is awkward. This particular horse has left the stable: We will be forever stuck with “big data” being a singular noun. “The data is as the data are,” as the Collective Curmudgeon put it (in fact, I’ll admit I wrote that particular anecdote as a rant on the subject). (Caveat: Commander Data, the character in the popular science fiction series, Star Trek: The Next Generation, is definitely a singular noun, as one of my meta-analysis seminar students wryly noted.)
The real problem is the sloppiness that ensues when people assume that statistics are data, because they are only rarely the same thing. Part of the misunderstanding probably stems from the fact noted in the prior paragraph:
Not so fast.
As a ready example, look at this Stata database about automobile economy in the late 1970s. In reality, a “database” is merely columns and/or rows of numbers and qualitative information about some phenomenon, which are variables. Although some variables are purely qualitative (e.g., make, foreign), the numeric ones are measurements of aspects that, in everyday parlance, seem purely qualitative. In everyday life, a Cadillac Seville might look like a heavier car than a Buick Opel; putting exemplars of these categories on a scale gives greater precision, although even scales are subject to error and bias.
The observations also have assumptions. For example, if we put these two cars on the moon, they would weigh far less, although the Seville would still be the same proportion heavier. (Their mass would remain the same.) In the current case, you could also question whether specific exemplars of Sevilles and Opels always weigh the same; different options may make weight higher or lower, yet this database appears to have single observations of each make. Thus, this database will not be informative about varieties of Sevilles and Opels and other makes, only about the makes in general. (Of course, it also is not very applicable to contemporary automobiles, which thankfully have far better fuel economy than these mid-1970s dinosaurs!)
Similarly, observational methods in studies also make assumptions however small or large they may be. There is the presumption that the variable measured is in fact measured, rather than merely being face valid. Self-reports of some variables (e.g., church attendance) are routinely exaggerated, according to some accounts. Methodologists have determined that some biases make effects smaller (e.g., low reliability or validity) whereas other artifacts might make effects larger (e.g., differential attrition between treatment and control groups). Work on methodological quality in meta-analysis suggests that these biases and artifacts may or may not be correlated with each other in a given study and that indicators of methodological quality vary widely across domains of scientific inquiry.
Moreover, gathering and cleaning variables to create databases can be quite a time- and labor-intensive process, despite the great ease with which some large databases can be gathered, such as millions of clicks on a website. “But practically, because of the diversity of data, you spend a lot of your time being a data janitor, before you can get to the cool, sexy things that got you into the field in the first place,” said Matt Mohebbi, a data scientist cited in a recent NY Times article; the article concluded that:
“Data scientists emphasize that there will always be some hands-on work in data preparation, and there should be. Data science, they say, is a step-by-step process of experimentation.”
In my own SHARP research lab, we have spent countless hours simply renaming variables and re-scoring them in order to make them useful when merged with other databases. Adding geocodes to observations can take a great deal of time. Meta-analyses of large literatures can take months or even years to compile observations in relation to all the included studies.
In short, big databases are often big trouble.
Once a database is ready to go, there is the problem of interpreting what it means. Even in a “small” data study, there are probably more observations and variables than are easily interpretable to the unaided eye, except when the pattern is stark (viz. a large effect size). And the problem grows as the database grows: Data scientists often include numerous variables and observations. Instead, statistics are necessary to help make sense of the morass in databases, big and small, and we even use them to gauge the magnitude of effects in the database, ranging from nil to humongous.
In generating statistics about trends in the database, the data modeler must make assumptions about what variables are important and about what the statistics mean. Some guidance comes from conclusions in past publications, whether from original research studies or from relevant theories in the domain. “There’s nothing so practical as a good theory,” opined social psychology pioneer Kurt Lewin in a 1940s book chapter. I agree about the importance of theories, because they help you focus on the most important variables, but I also think theories can also act like blinders. If you have a pet theory, it might prejudice you against alternative theories that make different or more nuanced predictions.
In the case of the automobile database above, some people--theorists--might think of the automobiles as conveyance devices, whereas others might think of them as engines of climate heating and change. Still others might think of automobiles as personality projections of their owners’ deepest selves… or as found art… or as obstacles to an optimal inner city environment for pedestrians and bikers… and so on. In like fashion, results from a study should probably never be considered to have only one conclusion. Taking new perspectives, using a different theory, helps to change the interpretation of the results.
In 2010, with 13 other scientists, I published a multi-level model of HIV risk and AIDS care, the Network Individual Resource (NIR) model. Having invested significant time and energy resources into this model, I am loathe to abandon it, but as a scientist I must try to be open minded. If the perfect theory has been developed, it is unlikely that it will ever be acknowledged as such. (Witness the slow pace of adoption of Darwin’s evolutionary theory; and even evolutionary theory has been significantly tweaked in the last half century.) In return for my open mindedness, I only request that when relevant, other scientists do the same. At the least, I’d like them to try the NIR model. Accordingly, in a recent open source publication focused on multi-level theoretical models for HIV prevention and AIDS care, Kaufman, Cornish, Zimmerman, and I recommended theoretical eclecticism. It is better to gain insights from multiple perspectives.
Some guidance about how to model data comes from the statistics themselves: In meta-analysis, if the effect sizes are homogeneous (i.e., I2 or tau2 are zero), then it is not possible for more complex models to achieve a better fit. That is, if all studies observed practically the same magnitude effect, then moderators (or effect modifiers) are not needed. If, in contrast, significant heterogeneity emerges (I2 or tau2 are greater than zero), then more complex models are needed and moderators are used. Then, the choice of moderators then is led by practical or theoretical guidance. It is worth pausing to consider that the assumptions of statistics themselves are a form of theory, yet applied statisticians often act as though the statistics directly gauge the truth. Instead, they should be careful to think what assumptions might temper such conclusions. In fact, in an era of big data, of millions of observations, contemporary statistics programs often cannot use all observations at once and must make shortcuts such as randomly sampling subsets of observations.
It should now be clear why common usage for the words “data” or “big data” is so sloppy. It is not the data points themselves that make the news, but instead the statistical trends that emerge from the data. We should not leap to the conclusion that “the data support conclusion X” but instead:
Based on the results from potentially biased conceptually driven statistical models of the data, which are imperfect and constrained in various respects, show conclusion X.
Continued to its logical extreme, that all studies are impossibly flawed, is going too far: Valuable insights emerge from many of the studies scientists conduct, especially as they refine their methods over time. Moreover, these conclusions are bolstered as studies accrue, independent replications emerge, and their conclusions are borne up by systematic reviews and meta-analyses. The field of engineering is stark testimony that generalizable knowledge exists. Face anyone who denies the truth in science with the facts: Flight would not happen without it. Nor computers. Or smartphones.
In the end, we should keep in mind that conclusions about trends in data are limited by many assumptions involved in compiling and modeling data. Using other nouns might well connote the care we take in reaching conclusions. Conclusions about data are necessarily theoretical and predicated on other assumptions. While it certainly seems easier to say “the data support conclusion X,” it is much wiser and correct to say that “results support the conclusion X” or “models show X.” We should not let the word data’s current cool allure tempt us into imprecise language.
In sum, it is difficult to disagree with the conclusion that Rob Low, Hayley MacDonald, and I reached in a recent Psychology & Health meta-review article that focused on the use of methodological quality in meta-analysis:
The data are necessarily theoretical, as are our conclusions about the data.