Tag Archives: Statistics

More On Psychology Papers Overstating Evidence

Here’s a followup to my previous post on a new paper suggesting that evidence is often overstated in psychology papers. You can find the actual paper here. As stated in the news articles, the paper does not show that false results are being reported. Rather, it shows that the significance of measured effects seems to be systematically lower in replicated studies than in the originals. Furthermore, the significance of replicated studies also increases with the significance of the original result, so it seems likely that the original papers may actually be getting the correct result but are just overestimating the significance.

Suppose we make the most charitable assumption and assume that all the results are being done correctly and are being accurately reported. Can we then still come up with an explanation for why published results are generally more significant than reality? I think we can. Perhaps the most telling quote in the Times article is the one by Norbert Schwarz of the University of Southern California:

“There’s no doubt replication is important, but it’s often just an attack, a vigilante exercise.”

If his opinion is typical of researchers in psychology, then it means that much of the field is suspicious of studies attempting to replicate previous results. I would argue that a bias in favor of more significant results could easily be the result of the following two hypotheses:

  1. Replication is seen by many as unprofessional or rude
  2. Non-significant results are more difficult to publish, particularly in high-quality journals

The second point would be the principal cause of a bias, with the first preventing people from discovering problems.

I have heard that it is generally true in most fields that non-significant results are often viewed as unpublishable or unimportant. This is actually not the case in particle physics and astrophysics, where there are well-developed quantitative models that we can test, but in most fields such models don’t really and probably can’t exist. But, assuming that this is true, let’s suppose that 20 research groups independently perform identical studies searching for some effect that does not exist. On average, around one of those groups will find a result with a p-value less than 0.05, which is a common threshold for statistical significance. If all 20 of those groups submit their results to a journal, and if only significant results are published, then the one or so study showing statistical significance is published and none of the others are.

The literature thus shows that a significant effect was found in one study but shows nothing on the other 19 or so that found nothing. Replication is seen as unprofessional, so it would potentially harm researchers’ careers to try to publish a new study. Even if such as study were done, it might not get published due to many in the field viewing replication with suspicion.

So, the positive result remains the only result in the literature even though it is not significant if all studies are considered. The literature then represents a heavily biased sample of all studies being done even though there is not anything inherently wrong with any individual study. When studies are replicated, we would then expect to find significance to fall because we are suddenly considering a less biased sample of results than the journals.

There are also plenty of other ways to get distorted result based on these two assumptions. If people know that only results reaching a certain level of significance are published, then there is a huge incentive to somehow massage the data to reach that threshold. We end up with a system where everything is biased toward finding positive results, and the lack of replication creates a ratchet effect where once a positive result is found other results challenging that result are discouraged.

Physics is somewhat protected by this sort of problem because of a culture where null results are not just acceptable, but are actually preferred in many cases. In a field like dark matter, a non-null result is looked on with suspicion while limit contours are the standard result being used. This maybe causes a bias in the other direction, where positive results are discouraged, but it also means that the community won’t accept something new until there is overwhelming evidence in favor of it.


Get Your Vaccines!

Io9 has a chart from the CDC showing the number of measles cases in the US in recent years. There have been hundreds of cases so far this year. Many recent outbreaks seem to be associated with places (like schools) having unusually large numbers of unvaccinated people. Opposition to vaccines has been growing in recent years, with much of it ultimately due to discredited papers claiming a link between vaccines and autism. Even if that link were true (it’s not as far as anyone has been able to tell), vaccines save enormous numbers of lives every year. Vaccines don’t work for everyone, so a high vaccination rate is needed to prevent outbreaks from occurring. While one person can avoid vaccines without causing problems for everyone else, if too many do that you end up with the current situation where diseases that were nearly eradicated come back. So, in summary, get your vaccines and vaccinate your children.

On Michelle Malkin, Or, How to Lie With Statistics

As I mentioned in a previous post, Michelle Malkin recently wrote an article tangentially related to the Ferguson, MO shooting that is an excellent example of how to mislead readers with statistics.

For some other background, Malkin is one of the most odious figures in the American political landscape. She literally wrote a book defending  the use of concentration camps and racial profiling against disfavored minorities, using the examples of Japanese Americans in the Second World War and Arab and Muslim Americans in the aftermath of the September 11th  attacks. Reason – another conservative-ish magazine which (along with many others) eviscerated this book, asking “Could it be that she actually supports the idea of detaining American Arabs and Muslims?” – has also responded to the article. I’ll reference that response later. I’m choosing this response because the two publications often find themselves on the same side in political arguments. Reason tends to lean toward anti-government activism, while the National Review still seems to be stuck in a Cold War anti-Communist mindset (with occasional, probably disingenuous, forays into libertarianism), seeing Stalinists hiding under every rock and behind every bush (and Obama too).

The gist of Malkin’s article is that we shouldn’t be concerned about overzealous and authoritarian police actions because being a police officer is a dangerous job. This central point is a fallacy. Being a police officer may be a dangerous job, and the vast majority of police officers may be doing their best to protect the average citizen, but this does not mean that the public should not concern itself with police misconduct or even the appearance of police misconduct. Public oversight of public institutions is a key part of democratic governance. If a significant part of the population does not feel that such an important institution as law enforcement is supporting their interests (regardless of whether this feeling is factually correct), that is a problem for everyone.  Institutions depend on the cooperation and support of the people.

Malkin’s logic doesn’t hold up to scrutiny, but let’s now look at her statistics. First, she mentions that over a recent 10 year period, there were 1501 deaths of police officers. The title interprets this as saying that an officer is “killed” every “58 hours.” Reason has tracked down the source of this statistic. The article correctly summarizes the death statistic in the source. The title, however, is deeply misleading. “Killed” implies that the officers were murdered. The source splits up the deaths into a number of different classes and shows that many of these were due to things like illness and car crashes. Even the figure for shootings can be misleading: it may include accidents and suicides in addition to murders. The number of police officers murdered is likely to be much smaller than this figure. Any number of murders is bad, but Malkin is grossly exaggerating the danger of being an officer compared to the average job.

In her first “fact:” Again, the figure of 100 officers “killed” is suspect for the reasons above. The numbers of assaults and injuries are not sourced, so we have no idea what the context is. Are these police reports, criminal complaints, indictments, convictions, or something else like extrapolated polling data? Overuse of minor charges such as disorderly conduct, resisting arrest, etc. (i.e. “contempt of cop” in unjustified cases – see people arrested for assaulting an officer’s fist with their faces) is a widely known phenomenon so we don’t know how many of these are legitimate.

In the second “fact:” Again, these figures have no context. It is obvious that these numbers are from a different dataset than the earlier 10 year figures. I would guess that these are all-time figures. A few things: New York has been the largest city in the US nearly since the beginning of the country. It would not be surprising if the NYPD has seen the most deaths of any municipal police department when it is both larger and older than nearly every other major department. Likewise, Texas is now the second largest state and has been one of the largest states for decades. It, too, will be expected to have more deaths than most states simply because of size.

Total deaths is just not a very useful statistic. Number per capita per year is more better for comparing different numbers, whether the number of deaths per police population per year or number of deaths per total population per year. This removes many size and age effects, though there are many other reasons why a direct comparison may be difficult. This has no bearing on the article because the article does not compare the numbers to anything. We have no idea how the death rate compares to the general population, or to other traditionally dangerous jobs. Additionally, the article is talking about policing today but provides historical figures. Ten year figures are reasonable because it’s all in the recent past, but historical figures like the NY and TX ones are not. Crime rates have been dropping precipitously since the early 90s, so a historical average over a period of time greater than 10 years will not accurately represent the current environment. Trends over time are critical in understanding these statistics.

Malkin also provides a comparison of the number of police deaths during part of August 2014 to the same part of August in 2013. The rate for this year is 14% higher than last year, which sounds bad. But, the numbers are 72 in 2014 and 63 in 2013.  Now let’s assume that the rate of deaths was constant in 2013 and 2014. It’s not exactly true, but I’d like to see what happens if we make this assumption. Then, we can estimate that the average number of deaths in this part of August is (72+63)/2 = 67.5. We can probably safely assume that these are primarily single deaths (again, little context is provided, but multiple death incidents are almost certainly a small fraction of deaths). In this case, the number should be Poisson distributed. The standard deviation is then (67.5)1/2=8.2. You can see that both the 2013 and 2014 numbers are well within a single standard deviation of this estimated mean. The difference is simply not even close to statistically significant. This is because integrating the number over too short a period leads to numbers that are sensitive to statistical fluctuations. While this 14% increase may be true (I have no reason to doubt it), there is no way for us to tell that it represents a real increase in the murder rate and not statistical noise. The year-to-date totals (or even seasonal totals) will have much larger numbers allowing for a more precise comparison.

Malkin ends the piece with descriptions of several recent murders of police officers and then a gratuitous jab at Al Sharpton, a favorite bogeyman of right wing commentators who hasn’t been relevant in a long time. There aren’t any stats here but it’s clearly designed to elicit an emotional response against more liberal-minded people (represented by Sharpton) in the reader.

An important question, then, is why did Malkin include such suspect statistics and poor logic? In a Twitter response to the Reason author (linked in the Reason article), Malkin insinuates that it’s not due to ignorance. Rather, she wrote the piece to push an agenda – support police action no matter what, particularly when her political opponents oppose that action. She disclaims any responsibility for the statistics because it’s her source that’s biased, not her. Evaluating sources is an important part of persuasive writing, particularly if you have such a wide audience as Michelle Malkin. Knowingly using unreliable sources without comment is unethical while inadvertently doing so when one should know better is irresponsible. Malkin places her political agenda above her commitment to the truth. By acting as a purveyor of ignorance, she reveals herself to be a demagogue who has no place in debates on serious issues.

Charts: Where US Residents Were Born

The New York Times has a nice interactive page letting you see where people living in each state were born as a function of time. You can see various migration trends, both immigration into the country and movement between states. One of the most prominent features you can see across many states is a large decrease in the fraction of immigrants starting during the Great Depression, with immigration only rebounding in the 90s and later. In some states, the fraction of immigrants never recovered.

Of places I’ve lived, I was surprised to see New York and Massachusetts tied for having the largest fraction of the population born instate at 63%. New York has far more immigrants than Mass, although California has the most overall. Other interesting things:

  • There are almost as many native Californians in Nevada as native Nevadans. There are also almost as many immigrants in Nevada as native Nevadans.
  • Louisiana has the highest percentage of people born in the state (79%) while Nevada has the lowest (25%)
  • The states seem pretty evenly split between having the fraction of people born-out-of-state increasing or decreasing.