The Fourth Surge – What’s up with Michigan?

Zoom plot.

The Fourth Surge may or may not be upon us (more on that below). But it is certainly clear that new COVID-19 case numbers are growing unchecked in a few states. In the plot above, all the arrows pointing up and right – the top-right quadrant – represent states with increasing new cases numbers in both March and April 2021. Biggest by far: Michigan. The 10 April new cases per 100,000 rate of 73.6 is nearly four times the national rate of 20.5 new cases per 100,000 population. You can see it in the plot: the Michigan arrow is almost four times as long as the National arrow. New case rates in Michigan have been growing by two to four additional new cases each day since the start of March. That is from less that 15 new cases per 100,000 each day to nearly 75 new cases per day in just 40 days.

All of the states on the right side list saw an April increase in the number of new cases per 100,000 population. Which means the states on the left side all saw a decrease in April – good news! The states in the lower left quadrant saw decreases in new cases over both March and April. That may be why all their arrows are so short :).

About that Fourth Surge. At the national level, the numbers of new cases in April have erased all the improvements in March, which is why the National arrow in the plot above points down and to the right. But it’s not necessarily a trend yet – new case numbers have hovered around the 20 new cases per 100,000 population before. Clear in the plot below, however, is that the number new cases per 100,000 population just leapfrogged over the somewhat stable orbit in early March. It remains to be seen if this is a trend (that is, a surge) or part of a much bigger orbit around the average March case load numbers. See my web site for more details.

Another Update on the Fourth Surge

Copyright (c) 2021 D.S.Dixon
Data from the New York Times database.[1]

The April uptick looks like it may be settling into an orbit between the early-March orbit and the late-March orbit. If so, that’s a really good thing, and a sign that most Americans are still taking COVID-19 seriously. We are likely beginning to see the benefits of wide-spread vaccination, as well. The individual states are still all over the place in terms of trends. See my web page for more information.

[1]The New York Times. (2020). Coronavirus (Covid-19) Data in the United States. https://github.com/nytimes/covid-19-data

Update on the Fourth Surge

Copyright (c) 2021 D.S.Dixon
Data from the New York Times database.[1]

The first three days of April have seen some movement away from the stable orbit about 17.6 new cases per 100,000 population. This isn’t a surge on the order of the holiday surges from 7 November 2020 to 16 January 2021. As some states continue to see falling new case rates, many states with steady or decreasing new case rates in March have also seen this uptick in April. See my web page for more information.

[1]The New York Times. (2020). Coronavirus (Covid-19) Data in the United States. https://github.com/nytimes/covid-19-data

No new COVID case surge – yet

There has been much talk about a new surge in COVID-19 cases, dubbed the Fourth Surge by CDC director Dr. Rochelle Walensky on 29 March 2021. So I checked it out.

I’ve been watching the dynamics of new COVID-19 cases in the U.S. for several months, posting daily updates to national and state plots on my website. Here’s a plot of the number of new cases per 100,000 population versus the daily change in the number of new cases per 100,000 population as of 25 March 2021 (posted on 26 March 2021).

The data come from The New York Times, based on reports from state and local health agencies[1]. The arrows show the daily changes color-coded by month, starting with January 2021, on the right, and ending with March 2021, on the left. Arrows in the top half point up and right to show increases in the number of new cases, while arrows in the bottom half point down and left to show decreases in the number of new cases. When the number of new cases is changing every day but averaging about the same over time, the arrows form a kind of circle (I call it an orbit). Note the orbit at the far right in mid January 2021, centered at about 70 new cases per 100,000 population.

Let me point out that these data are averaged over 18 days, while the CDC tends to use a seven-day average. The seven-day average tends to be too jittery to track daily changes for this purpose. The jitters in the seven-day average are a result of the various ways that individual states report COVID-19 data, combined with a weird thing that happens with these kinds of data: events tend to be recorded on certain days of the week even if they happened a day or two before.

This plot shows an overall trend of rapidly decreasing numbers of new cases since 16 January 2021, when it peaked at over 71 new cases per 100,000 population, down to around 20 cases per 100,000 population in March 2021. In March this trend ended, however, stalling first at about 20 new cases per 100,000 population, then slowly decreasing to slightly less than 18 new cases per 100,000 population on 23 March 2021.

Then with two days of increases – 24 and 25 March – the trend implied a worsening – a possible new surge? Fast forward to 31 March 2021, shown in the next plot.

Note that the uptick on 25 March was followed by a down tick on 26 March, and the subsequent five days saw a tight orbit about approximately 19 new cases per 100,000 population. Dodged that bullet!

On the national level, anyway, but some states are seeing rapid increases in the new cases that may be a sign of worse times to come. See my website for details.

[1]The New York Times. (2020). Coronavirus (Covid-19) Data in the United States. https://github.com/nytimes/covid-19-data

COVID-19 Holiday Surge

Copyright (c) 2021 D.S.Dixon
Data from the New York Times database.

One way to look at at the pace of the COVID-19 pandemic is in terms of the rate of change. If the rate of change is positive, the number of cases is increasing, and that’s bad. If the rate of change is decreasing, that’s better. But it’s not good until the rate of change is negative. That is, fewer cases each day.

The plot above shows the rate of change as a daily growth rate: the fractional change each day. Multiplied by 100 it’s a percent change each day. Since June that rate, for the U.S., has been between 1% per day and 2% per day. That is, one to two percent more people infected each day. That’s not good. If it goes on long enough at that rate, everyone gets infected, though that’s not going to happen because some people are immune for one reason or another.

You can find more plots, including individual state plots, on my website. The plot above just focuses on the period between 7 November 2020 and 7 January 2021. The thicker dashed line is the daily growth rate in total U.S. cases. Each thinner line is a U.S. State, the District of Columbia, Guam, Northern Mariana Islands, Puerto Rico, or Virgin Islands.

While the number of cases has been growing between one and two percent on average over this time, there are clearly three different trends here. One is the surge from 7 November to 2 December, labeled Halloween above, the second is the surge from 2 December to 30 December, labeled Thanksgiving, and the third is just starting on 30 December 2020.

I will be adding additional analysis on my website, but there are at least two things to take away from this snapshot here. One is that many states clearly participated in the surges shown, some much more than others. The other takeaway is that these surges run for around 30 days, starting about a week after a major holiday.

What if endogenous growth?

If you’re not an economist – or at least someone who took a macroeconomics course at the intermediate level or above – it’s unlikely you know what that means. First, however, the picture.

How national economies grow is a bit of a holy grail for Macroeconomists – the people who study the whole economy of a nation. Robert Solow came up with a beautifully simple model of economic growth in 1956 for which he got a Nobel prize in 1987 (technically, the The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel). Edmund Phelps, Nobel prize recipient in 2006, published a delightful fairy tale based on Solow’s growth model in 1961, from which we got the now-famous Golden Rule of economic growth. Many useful economic models came from these, and the basic Solow growth model is still used in contemporary macroeconomic analysis.

The Solow growth model and its many direct descendants assume that “growth happens”. That is, something makes the economy grow and we just put that into the model. We call these exogenous growth models (exo = outside, genous = created).

Some macroeconomists, however, were eager to find a model of the national economy that incorporated the thing that made it grow. We call these endogenous growth models (endo = inside). For example, a close descendant of the Solow growth model is the Ramsey–Cass–Koopmans model, which is partially endogenous.

An early proponent of endogenous growth theory is 2018 Nobel prize recipient Paul Romer. Like many great ideas, it started with an extremely simple idea. In 1986 Romer proposed an incredibly (as in, not expected to happen in real life) simple model where national output is a linear function of capital alone. Looking like this $$Y=AK$$ where $Y$ is national product (GDP), $K$ is total national capital stock, and $A$ is a constant that converts one into the other.

From a practical point of view, there are a few problems with this, not least which is the constant returns to capital. This, if nothing else, is a violation of the laws of thermodynamics, and we all know that physics always wins!

But what if this model – called the AK model for obvious reasons – really applied to the U.S. economy? That’s what the plot at the top of this post is about. The top solid line is real capital stock in billions of 2011 dollars – about 56 trillion dollars in 2017. The lower solid line is real GDP in billions of 2012 dollars – about 18 trillion dollars in 2017. The fact that they’re measured in dollars one year apart doesn’t really matter – there’s a lot of lag in the national economy. These are both plotted against the left axis.

The dashed line in the plot is the ratio of the two and represents A in the AK equation. This is plotted against the right axis. It’s coincidental that the capital stock plot appears to trace the average of the dashed line plot, but it helps to see the trend. That is, were the AK model to apply to the U.S. economy, it has certainly not been constant over the past 67 years, not even on average.

So here’s the endogenous growth part of the AK model. Clearly $$dY=AdK$$

where $dY$ is the change of production (growth in the economy) and $dK$ is the change of capital stock. What we know about the change of capital stock is that it’s a net change, with investment coming in and depreciation going out $$dK=sY-\delta K$$ Here s is the national saving rate (the fraction of income that households save) and $\delta$ is the depreciation rate (the fraction of capital stock that wears out or is used up each year).

Now this is another gross simplification, assuming, among other things, a closed economy (no imports or exports) so that total investment equals total household savings, sY. It also assumes no taxes or government spending (hard to picture, but taxes and government spending can be subtracted without a fundamental change to the model), and a constant rate of depreciation.

To get the economic growth rate, divide change of production by total production $$\frac{dY}{Y}=A\frac{sAK-\delta K}{AK}$$ where on the right side, Y has been replaced by AK, since they’re equal. This simplifies to $$g_Y=sA-\delta$$ where $g_Y$ is the economic growth rate, $\frac{dY}{Y}$.

To do something with this, we have to estimate household saving rate $s$ and deprecation rate $\delta$. For the past 40 years the saving rate has been below 10%, with a weird spike to 33% in April 2020 (https://fred.stlouisfed.org/series/PSAVERT). The national depreciation rate on fixed assets was about 5% of total capital in 2017 (https://fred.stlouisfed.org/series/M1TTOTL1ES000). Assuming $s=0.10$ and using $A=0.32$, the 2017 number from the plot above, we get $g_Y=0.1*0.32-0.05=-0.018$ or a negative 1.8%. All of these estimates except A are averages over the past few decades when growth has averaged about 3%. Using the 2017 estimate for A should have given us the highest possible growth rate.

Clearly, the basic AK model does not approximate the US economy very well at all. Are there national economies it better approximates? That’s a topic for another time. Additional topics for another time: how to incorporate human capital and intellectual capital. While human capital may have the diminishing returns we expect (physics expects!) from physical capital, there’s no reason for intellectual capital to exhibit diminishing returns. SPOILER ALERT: this may be a way to get a real economy to look a lot more like the AK model.

What’s the risk of COVID-19 in New Mexico?

I’m not a doctor (not in medicine, anyway) but I am something of a specialist in probability and statistics. These are my thoughts on a low-probability, high-impact event: COVID-19 infection.

Let’s start with historical data, but the important idea is that historical statistics are predictors of the past, not the future. They are based on a state of the universe that existed some time before the statistics were taken, and the universe has changed – a lot – since then. The numbers behind the statistics are growing, and the thing that matters – and we don’t know – is the growth rate of those numbers.

Consider the of population people already infected with COVID-19 (as we will in a couple of paragraphs). That number is changing all the time.

Popular media like to use the term “exponential growth” to mean something like “a lot”, but that’s not the point. If the growth rate is near zero, than it takes a very long time for the population to increase even with exponential growth. If the growth rate is high, then the population is increasing in size rapidly.

What is the risk of contracting COVID-19 in New Mexico based on historical statistics? How does it compare to neighboring Western states? If we assume completely random interactions across the whole state, here is the probability of exposure per interaction with another person based on infection rates on the morning of 15 March 2020

NM 0.00000621
OR 0.00000835
WA 0.00008055
CA 0.00000956
ID 0.00000114
UT 0.00000285
CO 0.00001791
AZ 0.00000167

Those are really small numbers, and kind of hard to visualize. What’s the biggest crowd I anticipate encountering? Less than 10,000 – I hope! So, in a crowd of 10,000, how many would be infected?

NM: less than two-thirds of a person
OR: slightly more than four-fifths of a person
WA: slightly more than eight people
CA: slightly less than one person
ID: slightly more than one-tenth of a person
UT: slightly more than one-fourth of a person
CO: a little less than two people
AZ: slightly more than one-sixth of a person

What we don’t know is the probability of infection given exposure, which is certainly less than 1.0. Worse case, assume 1.0, so these are also the risks of infection.

In reality, the probability of encountering the infection is much, much higher. Testing is only done on a very small fraction of individuals with symptoms, so there’s a lot more infected people out there. And an unknown number of carriers are asymptomatic. And while we’re at it, let’s take a closer look at what we mean by asymptomatic. Literally, it means “not showing symptoms”, but in this context it means “a thriving community of the virus that hasn’t made the host sick — yet.”

So what if the probability were ten times greater? Reduce the size of the crowd above to 1,000. Here in NM, if you want a better than 50/50 chance of avoiding infection, avoid crowds of 500 or more.

But these are all probabilities – your mileage will vary. You could meet only one person in a day but it could be the wrong person.

To reduce your exposure, stay home. When you do go out, avoid crowds. Even little crowds.

Assume you are exposed when you go out. These are ways to reduce your probability of infection:

  • DON’T TOUCH YOUR FACE
  • Wash your hands when you get home.
  • Disinfect your cellphone before you wash your hands (you’re likely to have handled your cellphone while you were in a higher-risk environment like the supermarket). Don’t touch the disinfected phone before you wash your hands.
  • Avoid crowds and, more importantly, close proximity to others. Self-check-out at the supermarket looks better, doesn’t it? Not so much because there’s no checker opposite the register, but there’s no one standing right behind you in line, at least while you’re scanning.
  • Ordering online looks even better, no? DISINFECT ANY PACKAGES YOU RECEIVE. That package has been a lot of places and handled by a lot of people.

The Changing U.S. Economy

An assignment in my Spring 2018 senior-level macroeconomics course had my students look up the total capital expenditure and total labor expenditure by economic sector. Each student was assigned a different year. I gave them a spreadsheet that ordered the sectors by labor/capital ratio and plotted them in a unit box. While the students were presenting their results, it occurred to us all that it would make an interesting movie. I slapped something together at the time, but I just went back and did it right this time.

About the plots

Each year, the capital per sector is divided by total capital, so that all capital adds up to one. Similarly, the labor per sector is divided by total labor, so that all labor adds up to one. The sectors of the economy are sorted by slope (labor divided by capital). That way, their plot forms an upward curving arc from (0,0) to (1,1).

BLS (U.S. Bureau of Labor Statistics) has payroll data starting 1939, but from 1936 to 1946, it only tracks one sector: manufacturing. Starting in 1947, two more sectors were added: construction, and mining & logging. Not until 1964 was the service sector added, when it already accounted for 50% more in payrolls than manufacturing. Added at that time was a sector combining trade, transportation, and utilities, then they were split out in 1972. BLS never tracked farm payrolls – agriculture is largely taboo at BLS, it seems.

The capital data from BEA (U.S. Bureau of Economics Analysis) is remarkably consistent over the years. The sectors in BEA data are slightly different from the sectors in BLS data. It was easier to combine all agriculture and mining to put together with the mining & logging sector labor data, so the plots will always overstate capital in that sector. From 1964 to 1971, when BLS combined trade, transportation, and utilities, the capital data are combined for those sectors. The combined sector is represented by a line striped in the colors for the individual sectors.

Things to note

  • The plots from 1947 to 1961 are not directly comparable with the other years because there are large, missing sectors during that time and especially at the end.
  • The  curvature changes considerably between 1971 and 1972, when trade, transportation, and utilities were split out. The transportation sector moves to the low-labor end, the utilities sector stays about in the middle, and the trade sector moves up to the high-labor end.
  • Manufacturing is always becoming less steep, that is, less labor-intensive (more automated). This appears as a clockwise rotation of the manufacturing line.
  • The labor from manufacturing seems to go to construction between 1947 and 1963.
  • The service sector accounts for about 30% of labor in 1964. This grows steadily to about 80% in 2016.
  • Growth in the service sector is very rapid between 1972 and 2010.
  • The clockwise rotation of the manufacturing line is even more rapid  between 1964 and 1971 as the services sector takes an increasing share of labor.
  • The slope of the manufacturing line is relatively constant from 1972 to 2000, but resumes it’s clockwise rotation from 2001 to 2010.
  • The size and slope of the manufacturing line are pretty stable from 2011 to 2016.

Acknowledgements

Thanks to: Kirsten Andersen, Kristin Carl, Brittany Chacon, Yu Ting Chang, Andrew Detlefs, Kyle Dougherty, Thomas Henderson, Darren Ho, Jack Hodge, Alanis Jackson, Madeline Kee, Eric Knewitz, Jackson Kniebuehler, Phuong Mach, Abraham Maggard, Alexander Palm, Luisa Sanchez-Carrera, Teran Villa, Tsz Hong Yu

Math is too hard for Economics students?

The New Yorker ran an article about how math is too hard for Economics students.

My experience is that most Economics programs are not math-intensive. Maybe the top programs are – Harvard, Chicago, London School of Economics – because they can be. That is, a diploma from one of the top schools means you can do it all. If you can’t do it all, lower your expectations.

There are some math-phobic students (and professors) who want the high-prestige diplomas without the hard math. Piketty – like many celebrities – self-promotes by trivializing the institutions that got him where he is. He sounds a bit like someone in XKCD.

The reality is that there are some very important theories of Economics that require hard math. Will every economist need them in practice? No, not any more than most composers, computer scientists, or satellite engineers will need to use their foundational knowledge under normal circumstances. That’s not why they learn it: they learn it to be prepared for unusual circumstances.

The truth is that macroeconomics has a surfeit of competing explanations for almost everything – it doesn’t suffer a lack of out-of-the-box thinking. But it does suffer a shortage of sound ideas that can be tested theoretically and/or demonstrated through data. That’s where the hard math comes in.

Exam 1

Question 1

Consider the regression model

y = β0 + β1 x + β2 z + u

Suppose that the mean of the error term is not zero, in fact

E(u) = α0

where α0 is a constant.

a) Create a new error function based on u for which the expected value of the error function is zero.

Define a new error function v

v = u – α0

so that

E(v) = 0

b) Express u in terms of your new error function and substitute it into the regression model. Show how this transformed regression model has a transformed constant term.

Rearranging

u = α0 + v

Now

y = (α0 + β0) + β1 x + β2 z + v

which has intercept term α0 + β0.

c) Discuss the transformed constant term in terms of the original constant term and the transformed error function.

Considering the transformation as a function of v

U(v) = v + α0

then the intercept is merely transformed

U(β0) = β0 + α0

d) Show how the transformed regression model has the same estimators for x and z as the untransformed regression model.

The estimators β1 and β2 are the same in both the untransformed and the transformed models. Thus, the estimators remain unbiased under a linear transform of the regression model

U(y) = y + α0

Question 2

Here is a useful fact

Var(x + c) = Var(x)

where c is a constant.

a) Use this fact to show how the variance of your transformed error function in Question 1 relates to the variance of the orignal error term, Var(u).

The variance of the transformed error function is

Var(v) = Var(u – α0) = Var(u)

That is, the transformed error function has the same variance as the untransformed error function.

b) Discuss your results from Questions 1 and 2a in terms of OLS bias and efficiency.

The results in Question 1 show that a regression model is not biased by a linear transformation. The result in Question 2a shows that a linear transformation does not affect variance, and therefore will not impact OLS efficiency.

Question 3

A colleague found a large dataset of household spending and saving data and proposed the following econometric model:

csave = β0 + β1 income + β2 nchild + β3 pareduc + β4 pctsh + β5 pctfood + β6 pctclo + β7 pctoth + u

where

csave = savings for college
income = household income
nchild = number of children in the household
pareduc = average education of adults in the household
pctsh = percent of household budget spent on shelter (0 to 100)
pctfood = percent of household budget spent on food (0 to 100)
pctclo = percent of household budget spent on clothing (0 to 100)
pctoth = percent of household budget not included in pctsh, pctfood, or pctclo

a) What do you expect to be the signs of the estimators?

Expected signs:

income positive (college is a normal good)
nchild positive or negative (complex interaction of other factors)
pareduc positive (propensity for education)
pctsh negative (opposite to income)
pctfood negative (opposite to income)
pctclo negative (opposite to income)
pctoth positive (opposite to shelter, food, and clothing)

b) Which variables would you consider examining for censoring?

Natural censoring is likely with income and number of children. This is further exacerbated with selection bias if, for example, the data only include home-owners, families with children, or families on assistance.

c) Do you see any potential problems with colinearity?

The four budget terms are perfectly colinear.

d) What counsel will you give your colleague about this model?

I will suggest that pctoth be dropped to eliminate the perfect colinearity.

d) What counsel will you give your colleague about the data?

I will recommend plotting histograms of all the variables to get a sense of their distributions and look for censoring, selection bias, and possible binning, particularly of income.

Question 4

Listed below is the summary of a regression of cash dividends on after-tax profit. These are national aggregate statistics from the U.S. Department of Commerce for 1974 through 1986.

    Call:
    lm(formula = dividends ~ profit, data = mydata)

    Residuals:
         Min     1Q Median     3Q    Max 
     -9332  -4706  -2600   5203  11056

    Coefficients:
                     Estimate Std. Error t value Pr(>|t|)   
    (Intercept)  373.3014  9530.3787   0.039  0.96946   
    profit         0.4200     0.1154   3.641  0.00388 **
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 6956 on 11 degrees of freedom
    Multiple R-squared: 0.5465, Adjusted R-squared: 0.5052 
    F-statistic: 13.25 on 1 and 11 DF,  p-value: 0.003884

a) Describe the goodness of fit and signficance of the model. Make your statement in terms of dividends and profit and not in terms of variables, parameters, etc. Use R2 and the F-test to support your statement.

This model is a measure of what fraction of after-tax profits companies distribute to shareholders in the form of dividends. Although profit is the principle component of dividends, dividends are not the only way in which profit is spent. The regression results indicate that there is strong, pausibly causal, relationship between profit and dividends, but that this relationship only tells about half the story. The strength of the relationship is indicated by the F-statistics of 13.25, which is significant at the one percent level. The extent to which this only tells part of the story is reflected in an R2 of about 0.55, meaning that nearly half of the variation in dividends and profit is not captured in this relationship. Given that many firms do not pay dividends, this is not surprising.

b) Discuss the meanings of the estimators and their significance. Again, make your statements in terms of the model, then use regression results to support them.

Many stocks do not pay dividends, with profit going into stock value, so the fraction will be less than one, but should be considerably more than zero. A constant term would reflect dividends paid irrespective of profit and would be, in principle, zero. The regression results show that, on the average, between 1974 and 1986, U.S. corporations distributed 42% of their profits to shareholders. This result is signicant at the one-percent level. The intercept is not significantly different from zero, as anticipated.

c) With a 95% confidence interval, assess the null hypothesis that there is no relationship between profit and dividends.

The null hypothesis that there is no relationship between profit and dividends is rejected if the F-statistic indicates a probability greater than five percent that all parameters are zero. The F-statistic indicates that the probability of all parameters being zero is less than 0.4%, for a confidence level of 99.6%, which certainly falls within the 95% confidence interval.