For the second time in a couple of months, one of my Facebook friends posts a link to an article – in this case on Daily Kos – about how the apparent increase in life expectancy is not happening for those who fail to graduate high school. In fact, these people’s life expectancy is dropping. In both cases, the article she links then goes on to argue against raising the minimum recipient age for Social Security on this basis. Because it’s unfair raise the minimum age of eligibility for Social Security benefits if the group that needs it the most, rather than living longer, is actually dying earlier.
This second time around, I got curious and had a look at the data. It comes from this study, and – surprise, surprise! – it’s flawed.
Here’s the offending paragraph from the summary:
It is important to note that the size of the least educated subgroup of the U.S. population has been shrinking in recent decades (down to about 8% for whites). This is good news on the one hand since this suggests that younger cohorts moving up through the age structure are more highly educated than their predecessors. The bad news is that this decline in the least educated population is occurring, in part, because they die younger than their more highly educated counterparts. This leads to what is known as selection bias – that is, the dynamics of the group being investigated are changing with time. This selection process is quite common as it accounts for the changing dynamics of all subgroups of a population facing different levels of mortality – such as smokers versus non-smokers or people at low versus high risk of cancer. This may partially explain the trends in life expectancy we observe here, but the rising death rates are nevertheless quite real.
To which I say COME ON. What you meant to say is that the effect you think you found is a result of the same error involved in Simpson’s Paradox, a well-known statistical illusion, and there’s actually no trend at all.
Simpson’s paradox is … well, let’s leave it to Wikipedia:
Simpson’s paradox (or the Yule-Simpson effect) is a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data.
So, to use the classic example that actually went to the Supreme Court, the graduate school of an institution of higher education – call it UC Berkeley, because that’s its name – was accused of gender bias in admissions. And, if you looked at the big picture, the data was striking: 44% of men who applied were admitted, but only 35% of women. Open-and-shut case, then, right? WRONG – and the reason is interesting. In fact, if you broke it down by individual department, not only did the perceived bias against women disappear, it reversed. In every department to which they applied, women had an equal or greater chance of admission than men. HOW IS THIS POSSIBLE?
It’s possible if you consider that women applied in greater numbers to more competitive departments – like English and Psychology – where men applied to less competitive departments – like Physics and Chemistry. Now, I’m using “competitive” here in a really mechanical way: number of seats vs. number of applicants. The point is that in departments like Physics, more people who apply get admitted than is the case for English, and women were applying in greater numbers than men to the departments which had a comparatively high number of applicants relative to the seats offered. So, you might have, say 200 male applicants and only 50 female applicants to the Physics department, but 40 of the 50 females get in (an 80% admission rate for women) and only 100 of the 200 male applicants (a 50% admission rate for men). And for English, you might have 250 male applicants and 400 female applicants, and 50 males and 100 females are admitted – giving a 20% admission rate for men and 25% for women. It should be obvious that you can do this across all departments and end up with an aggregate admission rate for men that is higher for women, even though women’s chances are actually better than men’s for each department they apply for.
The interesting thing about Simpson’s Paradox is that it forces you to make a judgement call. In the case of UC Berkeley, when we say “gender discrimination in admissions,” do we mean
(1) “the proportion of female applicants who are admitted is lower than the proportion of male applicants”
(2) “an applicant’s gender may negatively impact the chances that his application for admission will be accepted”
Everyone honest understands it to mean (2), I think. And that’s also what the court thought: Berkeley was not guilty of gender discrimination. If anything, it was guilty of discrimination against men.
But from the point of view of statistics, of course, the numbers are just numbers, and in some situations you may decide that the appropriate interpretation goes the other way. For an example of when you might want to interpret the numbers the other way, see the linked Wikipedia article’s subsecton on batting averages. For a stretch of three years, Derek Jeter has a lower batting average in each individual year than does David Justice. However, when you take the total batting average over the three years, Jeter comes out better. In this case, the rational interpretation is just the opposite: it’s the aggregate numbers that matter, and Derek Jeter is better at bat than David Justice. But again, that’s only because we undestand batting skill as the ability to hit a high proportion of the balls thrown at you regardless of time horizon. More years is just more evidence. Unlike with admissions, where an applicant can only apply to one department at a time, baseball batters are faced with a variety of pitchers, and their different abilities come out in the wash.
What’s probably going on with this study is similar to the error in Simpson’s Paradox. Which is to say, it’s straightfowardly the ecological correlation problem – where you compare group means that actually interact on an individual level. The authors of the study think that by mentioning selection bias, you will assume that they’ve corrected for it, when in fact they probably haven’t.
Let’s break that quote down.
It is important to note that the size of the least educated subgroup of the U.S. population has been shrinking in recent decades (down to about 8% for whites)
Yes, it really is. Because it’s from the disparity in the sizes of subgroups that Simpson’s Paradox lives! So, more and more people are getting more education, and life expectancy is increasing. However, there’s a core group of people who receive no higher education, and their life expectancy is dropping.
The bad news is that this decline in the least educated population is occurring, in part, because they die younger than their more highly educated counterparts
And yet, there’s nothing here to substantiate that. Consider that this would appear to be true even if the real reason is that more people who, in generations past, wouldn’t have gone on to higher education now do so. So, let’s imagine instead that what’s going on is that the proportion of people who get more than 12 years of education is increasing because there are more educational opportunities, and since proportion is a zero-sum game (it has to sum up to 100%), this entails that the proportion of the population that gets less than 12 years of education is shrinking. Now, let’s further imagine – as seems plausible – that the people who have newly joined the ranks of the more-educated have shorter lives than people who were traditionally in this group. Not only that, but the people who were “left behind” in the less educated group also happened to be the shortest-lived among that group traditionally. Well, in that case, the people transfered to the ranks of the nominally educated will be dragging down the average of the moderately-educated, whereas in the past they were pushing up the average of the under-educated. Basically, they’re people who make respectable, but not ideal, life choices, and in the past they stayed undereducated because of lack of opportunity, and now they don’t. By contrast, there’s a core of people who just make poor life choices, live relatively short lives as a consequence, and they were undereducated in the past and continue to be so now.
Well, this is, obviously, ripe for Simpson’s Paradox. It can be the case, depending on the size of the new members of the ranks of the moderately-educated, that life expectancy decreased in every individual category, and yet nevertheless increased for the population as a whole. That is, in fact, what I suspect happened, and the people who drew up this study are deliberately obscuring that point for political reasons.
Just to reiterate, the point about Simspon’s paradox is that human judgement is required. So, let’s say the picture I painted is correct. What conclusion do we draw from this? It seems obvious to me that we don’t conclude that lifespans are actually shortening for some people. More likely is that they’re simply failing to increase for those people, and the apparent decline is a statistical illusion. The policy implications in that case would seem to be that we maybe make an effort to get the people in this group a better education, but there is no reason to keep the age of eligibility for Social Security benefits to 65. In other words, this is the baseball case, and not the admissions case, and it’s the aggregate numbers that matter. The population is in general living longer, therefore the retirement age should be raised (assuming we keep Social Security – which seems like a bad idea to me, but neither does it seem politically expedient to get rid of it, so I’m making the assumption here that scrapping it is off the table). Since life expectancy is up by 5 years on average, there’s a case to be made for raising the leaving age to 70 – so I think people on the left should be happy they’re being offered 67. When you factor in everything else, that’s actually a real drop in the retirement age.
In order to show that the least-educated group is actually living shorter lives, you have to go beyond the group means. You have to show me something about the distribution within this group that proves that it’s getting more dangerous to lack an education than it used to be. One way you might do it is to further subdivide the least-educated by category of death. So, take the number of people who die in work-related accidents. If this number – the number of undereducated people who die in work-related accidents – is higher in absolute terms after controlling for population increase (i.e. NOT just as a proportion of the least-educated, because we’ve already shown how that can be misleading), then you might be on to something. Then it might be the case that the kinds of jobs these people have to work in are getting more dangerous, and so in some sense society is failing these people. We can, in fact, imagine a number of categories here. But the point is that you have to show that the decline in life expectancy for this group is accounted for by something other than the transfer of a subgroup that used to bring up the average out of this subgroup and into another.
This study fails to do that. Nowhere in their summary do they say anything that inspires confidence that they have controlled for this problem. Quite the contrary, they say weasely thing like this
This may partially explain the trends in life expectancy we observe here, but the rising death rates are nevertheless quite real.
which strongly suggest that they have not bothered to check. After all, the “rising death rates” would appear every bit as “quite real” numerically if they were a statistical illusion, and conceding that [selection artefacts] “may partially explain the trends in life expectancy” is inadequate in the obvious way: until you can tell us how much they explain those trends – and methods are available to you to do so – you don’t know that they aren’t in fact the better explanation.
I need not point out that the authors of this study are all medical professionals and should know better. I therefore pronounce this a bit of wilful political propaganda designed to confuse the public rather than educate it.