Category Archives: Stats

Get Your Vaccines!

Io9 has a chart from the CDC showing the number of measles cases in the US in recent years. There have been hundreds of cases so far this year. Many recent outbreaks seem to be associated with places (like schools) having unusually large numbers of unvaccinated people. Opposition to vaccines has been growing in recent years, with much of it ultimately due to discredited papers claiming a link between vaccines and autism. Even if that link were true (it’s not as far as anyone has been able to tell), vaccines save enormous numbers of lives every year. Vaccines don’t work for everyone, so a high vaccination rate is needed to prevent outbreaks from occurring. While one person can avoid vaccines without causing problems for everyone else, if too many do that you end up with the current situation where diseases that were nearly eradicated come back. So, in summary, get your vaccines and vaccinate your children.


On Michelle Malkin, Or, How to Lie With Statistics

As I mentioned in a previous post, Michelle Malkin recently wrote an article tangentially related to the Ferguson, MO shooting that is an excellent example of how to mislead readers with statistics.

For some other background, Malkin is one of the most odious figures in the American political landscape. She literally wrote a book defending  the use of concentration camps and racial profiling against disfavored minorities, using the examples of Japanese Americans in the Second World War and Arab and Muslim Americans in the aftermath of the September 11th  attacks. Reason – another conservative-ish magazine which (along with many others) eviscerated this book, asking “Could it be that she actually supports the idea of detaining American Arabs and Muslims?” – has also responded to the article. I’ll reference that response later. I’m choosing this response because the two publications often find themselves on the same side in political arguments. Reason tends to lean toward anti-government activism, while the National Review still seems to be stuck in a Cold War anti-Communist mindset (with occasional, probably disingenuous, forays into libertarianism), seeing Stalinists hiding under every rock and behind every bush (and Obama too).

The gist of Malkin’s article is that we shouldn’t be concerned about overzealous and authoritarian police actions because being a police officer is a dangerous job. This central point is a fallacy. Being a police officer may be a dangerous job, and the vast majority of police officers may be doing their best to protect the average citizen, but this does not mean that the public should not concern itself with police misconduct or even the appearance of police misconduct. Public oversight of public institutions is a key part of democratic governance. If a significant part of the population does not feel that such an important institution as law enforcement is supporting their interests (regardless of whether this feeling is factually correct), that is a problem for everyone.  Institutions depend on the cooperation and support of the people.

Malkin’s logic doesn’t hold up to scrutiny, but let’s now look at her statistics. First, she mentions that over a recent 10 year period, there were 1501 deaths of police officers. The title interprets this as saying that an officer is “killed” every “58 hours.” Reason has tracked down the source of this statistic. The article correctly summarizes the death statistic in the source. The title, however, is deeply misleading. “Killed” implies that the officers were murdered. The source splits up the deaths into a number of different classes and shows that many of these were due to things like illness and car crashes. Even the figure for shootings can be misleading: it may include accidents and suicides in addition to murders. The number of police officers murdered is likely to be much smaller than this figure. Any number of murders is bad, but Malkin is grossly exaggerating the danger of being an officer compared to the average job.

In her first “fact:” Again, the figure of 100 officers “killed” is suspect for the reasons above. The numbers of assaults and injuries are not sourced, so we have no idea what the context is. Are these police reports, criminal complaints, indictments, convictions, or something else like extrapolated polling data? Overuse of minor charges such as disorderly conduct, resisting arrest, etc. (i.e. “contempt of cop” in unjustified cases – see people arrested for assaulting an officer’s fist with their faces) is a widely known phenomenon so we don’t know how many of these are legitimate.

In the second “fact:” Again, these figures have no context. It is obvious that these numbers are from a different dataset than the earlier 10 year figures. I would guess that these are all-time figures. A few things: New York has been the largest city in the US nearly since the beginning of the country. It would not be surprising if the NYPD has seen the most deaths of any municipal police department when it is both larger and older than nearly every other major department. Likewise, Texas is now the second largest state and has been one of the largest states for decades. It, too, will be expected to have more deaths than most states simply because of size.

Total deaths is just not a very useful statistic. Number per capita per year is more better for comparing different numbers, whether the number of deaths per police population per year or number of deaths per total population per year. This removes many size and age effects, though there are many other reasons why a direct comparison may be difficult. This has no bearing on the article because the article does not compare the numbers to anything. We have no idea how the death rate compares to the general population, or to other traditionally dangerous jobs. Additionally, the article is talking about policing today but provides historical figures. Ten year figures are reasonable because it’s all in the recent past, but historical figures like the NY and TX ones are not. Crime rates have been dropping precipitously since the early 90s, so a historical average over a period of time greater than 10 years will not accurately represent the current environment. Trends over time are critical in understanding these statistics.

Malkin also provides a comparison of the number of police deaths during part of August 2014 to the same part of August in 2013. The rate for this year is 14% higher than last year, which sounds bad. But, the numbers are 72 in 2014 and 63 in 2013.  Now let’s assume that the rate of deaths was constant in 2013 and 2014. It’s not exactly true, but I’d like to see what happens if we make this assumption. Then, we can estimate that the average number of deaths in this part of August is (72+63)/2 = 67.5. We can probably safely assume that these are primarily single deaths (again, little context is provided, but multiple death incidents are almost certainly a small fraction of deaths). In this case, the number should be Poisson distributed. The standard deviation is then (67.5)1/2=8.2. You can see that both the 2013 and 2014 numbers are well within a single standard deviation of this estimated mean. The difference is simply not even close to statistically significant. This is because integrating the number over too short a period leads to numbers that are sensitive to statistical fluctuations. While this 14% increase may be true (I have no reason to doubt it), there is no way for us to tell that it represents a real increase in the murder rate and not statistical noise. The year-to-date totals (or even seasonal totals) will have much larger numbers allowing for a more precise comparison.

Malkin ends the piece with descriptions of several recent murders of police officers and then a gratuitous jab at Al Sharpton, a favorite bogeyman of right wing commentators who hasn’t been relevant in a long time. There aren’t any stats here but it’s clearly designed to elicit an emotional response against more liberal-minded people (represented by Sharpton) in the reader.

An important question, then, is why did Malkin include such suspect statistics and poor logic? In a Twitter response to the Reason author (linked in the Reason article), Malkin insinuates that it’s not due to ignorance. Rather, she wrote the piece to push an agenda – support police action no matter what, particularly when her political opponents oppose that action. She disclaims any responsibility for the statistics because it’s her source that’s biased, not her. Evaluating sources is an important part of persuasive writing, particularly if you have such a wide audience as Michelle Malkin. Knowingly using unreliable sources without comment is unethical while inadvertently doing so when one should know better is irresponsible. Malkin places her political agenda above her commitment to the truth. By acting as a purveyor of ignorance, she reveals herself to be a demagogue who has no place in debates on serious issues.

Charts: Where US Residents Were Born

The New York Times has a nice interactive page letting you see where people living in each state were born as a function of time. You can see various migration trends, both immigration into the country and movement between states. One of the most prominent features you can see across many states is a large decrease in the fraction of immigrants starting during the Great Depression, with immigration only rebounding in the 90s and later. In some states, the fraction of immigrants never recovered.

Of places I’ve lived, I was surprised to see New York and Massachusetts tied for having the largest fraction of the population born instate at 63%. New York has far more immigrants than Mass, although California has the most overall. Other interesting things:

  • There are almost as many native Californians in Nevada as native Nevadans. There are also almost as many immigrants in Nevada as native Nevadans.
  • Louisiana has the highest percentage of people born in the state (79%) while Nevada has the lowest (25%)
  • The states seem pretty evenly split between having the fraction of people born-out-of-state increasing or decreasing.