State Covid Scorecard: Third-Quarter Report
As the COVID-19 pandemic begins to end and the political/media hacks start discussing which politicians ‘won’ the pandemic, it is important to remember the unfortunate reality that some states are better equipped to handle a public health crisis. Because of this, metrics used to compare a state’s ‘performance’ would be more informative if adjusted by prior expectations. In this way, we can attempt to measure states’ relative effectiveness of direct mitigation measures rather than pre-existing factors that were in place prior to the pandemic. In short – how much of a state’s success is due to its response and how much is due to its inherent characteristics?
Let’s be honest: nobody expects a state like Mississippi to have a lower death per capita than, say, Vermont – even with a flawless pandemic response. Social, demographic, and political factors greatly affect a state’s health outcomes, perhaps more than a coherent pandemic strategy.
For example, a state with a higher obesity rate would expect to have higher rates of severe illness and death given the same rate of infection (with all other factors being equal). The state with higher obesity may have an overall higher rate of death, but after adjusting for the population’s rate of obesity, the two might have performed equally well in their response to the crisis. If Mississippi were to end the pandemic with an equal death rate to Vermont, it would be fair to say Mississippi performed ‘better’ having overcome so many disadvantages.
Adjusting for Pre-Existing Conditions
Firstly, what factors would determine a state’s health outcomes independent of its actual pandemic response? Even with just the most cursory knowledge of epidemiology, it is not difficult to make an educated guess. Some characteristics that would intuitively seem important are general health of the populace (e.g., obesity or mortality rates), demographics (e.g., proportion of elderly or percent of population in nursing homes), geography (e.g., population density or climate), and social factors (e.g., political ideology, education level, and poverty).
The outcome measured is a modified version of the reported deaths per million residents. Firstly, this modified metric only includes deaths reported after June 1, 2020 (through September 30, 2021) so that the analysis focuses on states’ response after the initial wave (i.e., after effective treatment and vaccines were developed). Secondly, the metric adds a certain number of cases as equivalent deaths – 500 cases equal to one death, specifically. That is, it is the number of deaths reported since June 1, 2020 in addition to the number of non-lethal cases reported since June 1, 2020 (the number of cases less the number of deaths reported) divided by 500. The rationale for incorporating cases is that a certain proportion of non-lethal cases will likely result in reduced lifespan/long-term quality of life. At the very least, there should be some penalty for temporary reduction in productivity that should be considered when evaluating performance.
As a first attempt and proof of concept, a multiple linear regression was fit to adjust for these factors to ‘score’ each state’s mitigation strategy. The residuals – the difference between the expected and actual deaths per million – can be thought of as some combination of 1) effect of state characteristics omitted/not captured in the model, 2) effect of mitigation strategy and 3) random error. (There is, of course, some additional uncertainty in the deaths per million outcomes as not all states have the same testing and death-reporting standards; however, we’re going to assume there is no widespread and significant misreporting of cause of death by any state. Excess deaths would likely be a more accurate measure so perhaps this will be explored later.). The ‘response’ of a state includes things such as school policy, mask use, vaccine administration, limitations on capacity, etc.
All explanatory variables were centered and scaled (with mean of 0 and standard deviation of 1) to compare the relative influence of each factor on death rates. One well-fitting model (referred to as ‘A’) that includes factors for each of the five categories mentioned is:
Deaths per million = 1948 + 301 (Obesity Rate) + 256 (Poverty Rate) + 306 (Pct Nursing Home)– 169 (Healthcare Expenditure) – 237 (Relative Humidity)
The magnitude of the estimates indicates the relative influence of each parameter. The percent of population in nursing homes has the most effect, followed by obesity rate, poverty rate, relative humidity, and per capita healthcare expenditure. As obesity, poverty, and the percent of population within nursing homes increases, so does the expected death rate, while an increase in healthcare expenditure and relative humidity is associated with a decrease (and vice versa). For example, an increase in 1 standard deviation of the obesity rate (or about 3.8 percentage points) increases the average deaths per million by 301. For every increase in 1 standard deviation in per capita healthcare spending (or $1,257 per person), states saw an average decrease in 169 adjusted deaths per million.
(It should be noted that the inclusion of median age, proportion of population 65 or older, and proportion 80 or older do not result in a better model fit than any of these variables. As we will see later, population age – a very popular but inaccurate talking point about Florida – is of relatively low importance to overall death rate).
In set A, the five states with the largest positive residuals are Florida, Montana, Massachusetts, Georgia, and Rhode Island. The five states with largest negative residuals are Nebraska, Hawaii, Maine, Vermont, and West Virginia. Florida’s residual (904 deaths/million greater than expected) would translate into more than 19,400 deaths greater than expected. We could easily stop here and lazily conclude that ‘Florida is the worst when adjusting for X’ but we have standards here. Here is another one:
Deaths per million = 2095 – 335 (High School Graduation Rate) – 244 (Pct Biden Vote) + 172 (Hospital Beds per Capita) + 133 (Median Air Quality Index) – 202 (State Expanded Medicaid)
In this set, referred to as ‘B,’ high school graduation rate is most influential, followed by percent of Biden vote, per capita hospital beds, and median air quality (state expanded Medicaid is a binary variable so not exactly comparable to the others). The interpretation is the same as set A; a one standard deviation increase in high school graduation rate (or 2.7 percentage points) decreases the expected deaths per million by 335, and so forth. Adjusting for all other variables, states that chose to expand Medicaid saw, on average, 202 fewer deaths per million that states that did not.
Adjusting for these variables, the five states with the largest positive residuals are Rhode Island, Montana, Arizona, Massachusetts, and Iowa. The five states with the largest negative residuals are California, Maine, Utah, Kentucky, and West Virginia. California’s residual (659 deaths/million less than expected) would translate into 26,000 deaths ‘prevented.’
These plots show each state’s expected value against the actual observed value for both sets of variables. An observed value above (below) the diagonal line may suggest a ‘worse’ (‘better’) response. With an R2 value of 0.68 and 0.77 for sets A and B, respectively, we can safely conclude that a large majority of the total variation in death rate can be explained with just a handful of pre-existing, uncontrollable factors. The current leader in per capita deaths, Mississippi, probably never had a chance at avoiding a terrible death toll while a state like Vermont was set up for success decades or even centuries ago.
It should be stressed that the above linear models are just a few of many plausible ones. If variables were changed or omitted or transformed, we would get a different residual for each state. Using another statistical model besides linear regression would also change the results. The goal is to get an average residual for each state across several well-fitting models, as well as discovering which characteristics are most associated with a state’s performance. The ‘true’ effect of a state’s characteristics is not likely captured by the two above equations, so it is best to get input from a diverse set of models that can handle complex interactions between variables.
To achieve this, five different models were used: gradient boosting machine (GBM), random forests (RF), support-vector machine (SVM), LASSO, and an ensemble of best subset linear models (BSLM). Although prediction is not the purpose, ten-fold cross-validation was performed to tune some of parameters in the GBM and SVM to avoid extreme over-fitting. To select the best subset of variables in BSLM method, 2000 random subsets of between one and five variables were sampled, with the fitted values of the best 20 unique subsets averaged (measured by AIC, with any variable with maximum VIF>5 iteratively removed and refit). To get a distribution of each state’s residual, this process was repeated with 500 bootstrap samples for each model, or 2500 bootstrap samples in total.
The basic theory is that a consistent deviation from the expected death rate across several models (and many thousands of resampling of the data to achieve slight variations in the parameter estimates and thus residuals) is most likely an effect of the state’s response. Florida, for example, had a negative residual across the 2500 bootstrap samples less than 1% of the time. The plot below shows the distribution of the bootstrap residuals for each state.
Florida still has one of the highest residuals when averaged across all models, currently in third-worst place behind Arizona (by far, the worst state thus far) and Rhode Island; Florida is likely to have moved into second worst since this is published. On the other end of the spectrum there are the ‘usual suspects’ – Hawaii, Vermont, and Maine in first, second, and third place, respectively. Other states of note are California (18th best), New York (26th best), and Texas (31st best), all at about average performance (which is not unexpected in large, diverse states). Despite vastly different raw deaths per million, New York and Louisiana (25th) are scored about the same in this analysis, with Louisiana just a slightly more negative average residual. That is an admirable result for Louisiana considering the consecutive brutal hurricane seasons and the relative performance of its peer states.
This ordering looks a lot like the unadjusted ranking of post-June 2020 deaths per million. This isn’t surprising given that a significant amount of the variability of death rate can be explained by pre-existing factors (>70% very easily achieved with a simple linear model, with probably some confounding of actual policy within the variables included); there is only so much a state can overcome (or screw up). Louisiana, for example, is fifth highest in post-June death rate – but that is exactly what is expected when adjusting for its characteristics, hence the average ranking of performance. Rhode Island and Massachusetts, despite doing quite poorly by their own standards, are still only mediocre at 33rd and 24th lowest in unadjusted death rate, respectively. West Virginia (34th lowest unadjusted to 22nd) and Kentucky (30th to 11th) were big movers in the good direction. Perhaps coincidental, but Jim Justice (governor of West Virginia) and Mitch McConnel (Pale Man of Kentucky) have been some of the least insane Republicans dealing with COVID-19on .
These results seem to show how little government policy seems to affect this outcome beyond expectations. After nearly two years, two administrations, and a lot of confusing messaging from the government and disinformation from right-wing media, there is only so much a government can do to influence the behavior of millions of people. Imagine the nightmare of overseeing California or Texas (or any large, diverse state) and having to tailor your response to all the diverse subpopulations living there. However, this isn’t a suggestion to have a ‘freedom over fear’ policy because many thousands of deaths are probably avoidable – especially in reckless states like Florida with a large population. Small nudges to incentivize or discourage behavior can have great compounding effects.
Paradoxically, it seems that states with less need for mandates and government intervention adopt more strict policies while states that could benefit from a more ‘heavy-handed’ approach tend to eschew precautions. This is a classic example of confounding in that a state’s characteristics affect both its policies and the actual outcome itself. For example, states that opted into Medicaid expansion are more likely to mandate masks in schools, thus complicating some of the inference. Nevertheless, the ability to successfully respond to a crisis is very much determined by the cumulative effect of thousands of decisions of voters and its government over decades.
If Louisiana behaved more like Florida or Arizona, thousands of more people would likely be dead today. Conversely, if Florida had just met expectations, 10000 more people would possibly be alive today (average residual of 476 deaths/million x 21.5 million). California, although just slightly above average performance, may have prevented the most deaths (nearly 5000) due to its size, despite moderate success. By this analysis, Ron Desantis has overseen more preventable deaths than any other state government – and by a wide margin.
|State||Mean Residual (deaths/million)||Total Deaths Caused (+)/Prevented (-)||State||Mean Residual (deaths/million)||Total Deaths Caused (+)/Prevented (-)|
Important State Characteristics
Which factors determine each state’s performance? The table shows a very crude and simplistic estimate of the relative importance of each of the variables considered. ‘100’ means that this variable was most important within the model (as defined by some criteria, such as mean decrease in prediction accuracy when removed), with all other ones being scaled to this maximum value. Note that difference in criteria and nature of each algorithm make these not exactly comparable (as presented). Comparing absolute values between methods should not be done, but within a method relative to other variables. For example, Incarceration Rate is 60% as important as Percent of Population with High School Diploma within the GBM, while this ratio is 95% within the Random Forest. The LASSO model performs variable selection, making many variables have zero influence, while SVM and RF have more equal influence among all variables.
|Rank||Variable||Average Variable Importance Ratio||GBM||RF||LASSO||SVM||BSLM|
|1||Pct Pop High School Diploma||94.6||100||100||100||73||100|
|3||Pct Pop Bachelor’s Degree||57.3||68.4||94.6||25.1||54.3||44|
|4||Hospital Beds per Capita||54.3||24.3||52.4||44.9||100||49.8|
|5||Pct Biden Vote||47||27.6||59||34.8||55.3||58.3|
|6||Nursing Home Residents per Capita||39.4||21.2||54.6||34.5||54.5||32.4|
|7||Heart Mortality Rate||36.4||41||80.3||18.5||30.1||12.1|
|8||Median Air Quality Index||35.7||7.2||46.7||33.7||57.7||33.1|
|10||State Expanded Medicaid||27.9||2.4||18.2||55.1||37.5||26|
|11||Pct Pop Black||25.1||7.6||55||6.9||39.5||16.8|
|13||Respiratory Mortality Rate||20||13.4||46.6||4||28.3||7.9|
|14||Average Relative Humidity||19.3||2||34.4||6.1||38||15.9|
|16||Alzheimer Mortality Rate||18.3||7.8||60||0.7||20.3||2.8|
|17||Pct Workforce ‘White Collar’||17.1||3.9||46.5||0.1||23||12.2|
|18||Median COL-Adjusted Income||16.6||6||43.9||1.1||26.6||5.5|
|19||Healthcare Expenditure per Capita||16.5||4.9||33.4||2.1||35.8||6.4|
|20||Pct Pop White||16.2||3.7||40.3||1.4||23.8||11.7|
|21||Pct Public Transport Use||15.8||7.1||51.5||0||15.1||5.5|
|22||Cancer Mortality Rate||15.6||7.6||36.1||0||30.4||3.7|
|23||Particulate Matter (PM2.5 Days)||15||9.2||34.9||0.2||28.1||2.6|
|27||Average Due Point||14.2||2.9||44.6||0||16.2||7.3|
|30||Pct Pop HUD Section 202 Housing||13.3||6.5||29.8||1.3||25.7||3.3|
|31||Pct Pop in Medicaid||12.7||8.2||26.2||0.2||21.7||7|
|32||Pct Pop in Medicare||12.6||2.8||34.2||0.9||21.6||3.5|
|33||Average Household Size||12.6||1||27.7||0.6||26.3||7.6|
|34||Pct Pop 80 or Older||12||3.6||28||0.2||21.5||6.9|
|35||Pct Pop 65 or Older||12||2.6||32.1||0.5||20.8||4.3|
|36||Pct Workforce Agriculture||12||2.4||34.6||0.6||17.2||5.1|
|37||Pct Workforce Arts, Entertainment, Recreation, Accommodation & Food||11.3||3.3||27.8||1.4||20.2||3.8|
We’ll go through a handful of the most important variables and try to analyze why each might affect death rates or, at the very least, be associated with them. Most of these are self-explanatory, as they simply tell the story of American inequality and poverty. They are all correlated and are a cause as well as an effect on each other (like education level and incarceration). Or they could simply be a sort of proxy variable that only correlates with the true effect that causes death. Either way, they can be used as an affective adjustment method.
1. Pct Pop. High School Diploma
As we can see from the table, the best predictor of a state’s death rate is the percent of a state’s population that has graduated high school. Remember – this is not necessarily causal, but more likely indicative of the social, political, and cultural aspects of a state. If you don’t have a high school diploma, you’re more likely to be poor, lack healthcare, be non-white, or work in a job that cannot be done remotely (or get sick leave, paid vacation, etc.). High school dropouts are also more likely to end up in prison.
2. Incarceration Rate
This likely has a causal component (prisons and jails are covid hot spots) so having more of the population locked in confined spaces – usually with inadequate healthcare – will increase the death rate. Predictably, states with the higher percentage of minorities (higher mortality) tend to have higher rates of incarceration. States with a high incarceration rate are, perversely, more likely to adopt a ‘freedom’ strategy bans mask and vaccine mandates, i.e., anything that might take societal cooperation or compassion for others. Only in America, right?
3. Pct Pop. Bachelor’s Degree
This is similar to the percent of people with high a school diploma. College graduates earn more money on average, along with all the benefits associated with higher income and more often a ‘white collar’ profession (this is defined as employees in the ‘Information,’ ‘Finance and Insurance,’ and ‘Professional, scientific, and technical services’ categories from the Bureau of Labor Statistics). College graduates are probably also less likely to be susceptible to Fox News propaganda and Facebook posts from your crazy right-wing uncle. They are more likely to live in higher-taxed states with more investment in social and healthcare spending. Also, they are less likely to smoke, be obese, or have chronic health problems like diabetes. Stay in school, kids!
4. Hospital Beds per Capita
You might think that having a high number of hospital beds would be good during a pandemic – and it might be in certain situations – but it is associated with a higher death rate. Being a business, hospitals have more beds in states where there is higher demand, driven by greater number of unhealthy people (or extremely old and frail, i.e., those in nursing homes).
5. Percent Biden Vote
This one has been discussed at lengths elsewhere. Democrats are more likely to be vaccinated, wear a mask, and take preventative measures in general. They also tend to not ridicule, threaten, or be dismissive of scientists who have been studying these sorts of things for decades. They probably do the opposite of whatever ‘don’t Fauci my Florida’ is. Dying to own the Libs!
6. Nursing Home Residents Per Capita
Like prisons and jails, covid entering a nursing home will almost assuredly end in disaster (barring wide-scale vaccination). The more people living in nursing homes, the higher the expected death rate. Other than this seemingly causal link, states with higher percentage of population in nursing homes also tend to have more chronic conditions, be elderly, and more Republican – all of which are hazardous to one’s health.
I think you get the idea: states with generally worse quality of life and socioeconomic conditions have fared worse than those with better ones. In other words, COVID-19 is just the latest massacre in America’s Class War – and the poor, vulnerable, and misguided (of all races) have paid the price once again.
The Myth of Population Age
With the recent dramatic rise of Florida in the ranks of death rate, a popular talking point has emerged: “Actually, Florida has a lower death rate than California/Blue Boogeyman State when adjusting by age of population.” (They really like to compare everything with California, for some reason).
This might make sense if age-weighted death rate was not just an artificial statistic with no intrinsic value, but it is merely an index that alone is insufficient at accurately comparing states. Invoking ‘age-adjustment’ might make sense if population age was the primary predictor of population death rate, but it’s not. As seen in the table of variable importance, none of the three age variables – median age, percent of population 65 or older, and percent 80 and older – are anywhere near the top for any model.
Going back to the best subset linear models, let’s see how often age variables produce the best-fitting models. 50,000 unique random subsets of between one and five variables were chosen and the 100 unique subsets with lowest AIC were kept (any variable with maximum VIF>5 was iteratively removed and remaining variables refit). Pct Pop 80 or Older, Pct Pop 65 or Older, Median Age, and Pct Pop in Medicare were in just 5%, 4%, 2%, and 2%, respectively, of the best 100 subsets (compared to 93% with Highschool Graduation Rate, 64% with Per Capita Hospital Beds, and 48% with Percent Biden Vote). Their inclusion could be nothing more than random chance. LASSO does not select any of the four age-related variables mentioned, either.
While age may be an individual-level risk factor, there is no evidence that the population’s age distribution has any effect on the overall death rates, which contradicts the entire rationale for age-weighting (which is, the older the population, the more deaths are expected so this needs to be adjusted for). In fact, having an older population may be advantageous (such as causing higher vaccine rates, increased physical distancing, fewer school children, etc.). States with a higher proportion of elderly may be more cognizant of risks – not only to the elderly themselves, but those around them, such as grandparents or elderly neighbors. Older states may have increased awareness or access to vaccines for the general population due to prioritizing vaccinations for the large elderly population.
But no matter the cause, when you compare Florida to its elderly peers (such as Maine, Vermont, West Virginia, etc.), it has underperformed.
What Went Wrong in Florida?
It’s important to remember which factors are associated with death rates: Education, healthcare, poverty, etc. Despite the Florida Man meme and jokes about rednecks wrestling alligators, meth, and general trashiness, Florida is a nice and relatively average state in many metrics that would determine public health outcomes. Yes, there is some Alabama mixed in here or there, but it is almost kind of approaching being a first-world place. It is also one of the most unique state (as seen in the dendrogram which shows clusters of similar states based on the characteristics used in this analysis).
It doesn’t have the worst high school graduation rates (-0.6 standard deviations from average) and its per capita incarceration is a little high, but nothing outrageous (0.7). It’s only slightly below average for population with a bachelor’s degree (-0.3) but it is above average for ‘white collar’ jobs (0.6). It has below average age-adjusted mortality: heart disease (-0.8), cancer (-0.7), respiratory (-0.6), and Alzheimer’s (-1.4), as well as obesity (-0.6). Overall, it has some positive things going for it despite some problems. It’s a purplish state that has plenty of normal, center/center-left people to counteract the fascists. By all accounts, Florida should have been below average – but not one of the worst. It certainly was not expected to be comparable to its Gulf Coast neighbors with their considerably worse quality of life indices.
We’ll never be able to definitively conclude the exact reasons Florida became one of the worst states, but there are some obvious guesses. Whether it was anti-mask hysteria, a lackadaisical vaccination strategy, or dangerous rhetoric from ‘leaders’ about the nonexistent risk to the non-elderly, the cumulative effects are clear: Florida has had one of the worst results after adjusting for its characteristics (and one of the worst overall with no adjustments). Whether this result holds through another winter surge is unknown, but in no way could Florida be considered a success story or a model to emulate.