Tag Archives: Psychology

In Which a Psychologist Completely Misunderstands Physics

About a week ago, the Times published an op-ed to that recent Science paper claiming that many psychology papers are overstating the evidence. The op-ed was written by Lisa Barrett, a psychology professor at Northeastern. Her main argument is that failure to replicate results is one of the ways that science discovers new things, so psychology papers failing to replicate is not a problem.

While I think there is some truth there, I am not convinced by this argument. In fact, I find the counterexample of subatomic physics failing to replicate Newton’s laws to be incredibly disingenuous. The point that the psychology paper was making is attempts to replicate the result be performing what is more or less the exact same study failed to obtain the same result far more often than would be expected from the experimental uncertainties. One of the nice things about physics is that laboratory conditions can be controlled enough to actually perform (almost) identical experiments in (almost) identical conditions. Newton’s laws can be pretty easily verified by almost anyone. You can set up a set of springs and pulleys and measure oscillation frequencies or see how much weight it takes to lift some object attached to a string. There will be small deviations from air resistance, friction, etc., but you will still in the end verify Newton’s laws up to uncertainties from some reasonably known experimental conditions.

Basic classical physics doesn’t break down when you try to replicate an experiment. Rather, it breaks down when you try to extrapolate physical laws a bit too far. Classical mechanics describes things at macroscopic size traveling at speeds much lower than the speed of light. Deviations do exist but are almost always orders of magnitude too small to measure in a realistic experiment. Subatomic physics typically breaks both the assumptions of (1) macroscopic scale and (2) low velocity, so the equations governing our everyday life stop working. Furthermore, this example is also comparing experimental tests of scientific theories to comparisons of different experimental results. If the experiments are truly equivalent, they should obtain the same result regardless of what any theory says. For some measurements there doesn’t even need to be much of a theory at all.

Unfortunately, life sciences and social sciences are often much more difficult to replicate. People and living things aren’t particles; they’re not all identical, interchangeable quanta. Selection effects are going to be a very difficult problem to avoid or control, so I think it is natural to not require the same kind of precision that we can get in something like physics. It may even be true that many of the results in the paper failed to replicate because the populations between the original and new studies were not equivalent. Maybe more care needs to be taken when choosing test subjects in the replication studies, but this is still a problem unless someone can identify exactly how to do this. The whole point of scientific experiments is to generate data that can be used to extrapolate into broader scientific theories. If scientists perform a number of seemingly identical studies and get an equal number of incompatible results, then there isn’t really any way to inform theories other than to say that we don’t yet understand the experimental data.

Even the example of giving a rat a shock suggests that the author has a fundamental misunderstanding of what it means to replicate a study. She mentions that different results are obtained depending on the exact experimental procedure. This is not surprising. If you change an experiment different things can happen because you are no longer replicating the original experiment. You are instead testing to see if your earlier result still applies in different circumstances. This case is also probably more akin to what happens in psychology studies than any example in physics. There will always be slight differences between studies using human subjects, but if the variations aren’t understood (as far as I know, the people studying rats knew how to control their experimental conditions so this wouldn’t be a problem for their measurements), then any result can be rendered almost meaningless.

As I said at the beginning, there still are some worthwhile points here. Sometimes a failure to replicate really is a sign that something interesting could be happening. In neutrino physics, there are a series of “anomalies” of different experiments not matching other experiments. In dark matter detection, there are conflicting measurements of possible dark matter signals and dark matter exclusion curves. There have even been a number of high profile blunders in physics. These things happen. New measurements are being done to try to resolve these conflicts, whether they are experimental errors or real effects. The big problems that seem to have been identified in psychology are that (1) these new measurements often aren’t being done and (2) measurements are consistently falling on the side of having stronger statistical significance than they should. (1) means that errors can’t be easily identified and (2) suggests that published results are both biased and consistently underestimate systematic uncertainties.

Report: Psychology Papers Often Overstate Evidence

It hasn’t been a great year for social science, with several high profile scandals involving faked or bad data. The New York Times has reported on a new article in Science looking into the reproducibility of 100 psychology articles. The paper concludes that 60 of these papers seemed to have issues. Fortunately, they did not uncover any fraudulent or outright false results, but it does seem like the papers were consistently overstating the evidence for their conclusions.

I haven’t had time to read the paper yet, but there are a few reasons why I think this could occur, so I might write a follow up post or two about this. The Times article does mention that the studies replicating the 100 papers were required to have the input of the original authors to make sure that the studies were as similar as possible.

I think it is important to note that the physical sciences are not immune to pathological results. Sometimes bad results happen even if everything is done correctly. Statistics typically allows this, so given enough perfectly designed and executed studies, some are bound to be wrong. Sometimes the authors simply forget to account for systematic uncertainty or maybe there is a problem with the theoretical model being tested. That is why we often see “3-sigma” effects in high energy physics disappear into noise. While “3-sigma” seemingly implies that there is a tiny chance of the result being wrong, there are still a lot of caveats. In the case of something like social sciences, the kind of precision found in much of physics is simply impossible to achieve, so the chance of getting a false result is likely to be much higher.

Hopefully someone will vet this paper to see if it, too, has bad methodology.