What if endogenous growth?

If you’re not an economist – or at least someone who took a macroeconomics course at the intermediate level or above – it’s unlikely you know what that means. First, however, the picture.

How national economies grow is a bit of a holy grail for Macroeconomists – the people who study the whole economy of a nation. Robert Solow came up with a beautifully simple model of economic growth in 1956 for which he got a Nobel prize in 1987 (technically, the The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel). Edmund Phelps, Nobel prize recipient in 2006, published a delightful fairy tale based on Solow’s growth model in 1961, from which we got the now-famous Golden Rule of economic growth. Many useful economic models came from these, and the basic Solow growth model is still used in contemporary macroeconomic analysis.

The Solow growth model and its many direct descendants assume that “growth happens”. That is, something makes the economy grow and we just put that into the model. We call these exogenous growth models (exo = outside, genous = created).

Some macroeconomists, however, were eager to find a model of the national economy that incorporated the thing that made it grow. We call these endogenous growth models (endo = inside). For example, a close descendant of the Solow growth model is the Ramsey–Cass–Koopmans model, which is partially endogenous.

An early proponent of endogenous growth theory is 2018 Nobel prize recipient Paul Romer. Like many great ideas, it started with an extremely simple idea. In 1986 Romer proposed an incredibly (as in, not expected to happen in real life) simple model where national output is a linear function of capital alone. Looking like this $$Y=AK$$ where $Y$ is national product (GDP), $K$ is total national capital stock, and $A$ is a constant that converts one into the other.

From a practical point of view, there are a few problems with this, not least which is the constant returns to capital. This, if nothing else, is a violation of the laws of thermodynamics, and we all know that physics always wins!

But what if this model – called the AK model for obvious reasons – really applied to the U.S. economy? That’s what the plot at the top of this post is about. The top solid line is real capital stock in billions of 2011 dollars – about 56 trillion dollars in 2017. The lower solid line is real GDP in billions of 2012 dollars – about 18 trillion dollars in 2017. The fact that they’re measured in dollars one year apart doesn’t really matter – there’s a lot of lag in the national economy. These are both plotted against the left axis.

The dashed line in the plot is the ratio of the two and represents A in the AK equation. This is plotted against the right axis. It’s coincidental that the capital stock plot appears to trace the average of the dashed line plot, but it helps to see the trend. That is, were the AK model to apply to the U.S. economy, it has certainly not been constant over the past 67 years, not even on average.

So here’s the endogenous growth part of the AK model. Clearly $$dY=AdK$$

where $dY$ is the change of production (growth in the economy) and $dK$ is the change of capital stock. What we know about the change of capital stock is that it’s a net change, with investment coming in and depreciation going out $$dK=sY-\delta K$$ Here s is the national saving rate (the fraction of income that households save) and $\delta$ is the depreciation rate (the fraction of capital stock that wears out or is used up each year).

Now this is another gross simplification, assuming, among other things, a closed economy (no imports or exports) so that total investment equals total household savings, sY. It also assumes no taxes or government spending (hard to picture, but taxes and government spending can be subtracted without a fundamental change to the model), and a constant rate of depreciation.

To get the economic growth rate, divide change of production by total production $$\frac{dY}{Y}=A\frac{sAK-\delta K}{AK}$$ where on the right side, Y has been replaced by AK, since they’re equal. This simplifies to $$g_Y=sA-\delta$$ where $g_Y$ is the economic growth rate, $\frac{dY}{Y}$.

To do something with this, we have to estimate household saving rate $s$ and deprecation rate $\delta$. For the past 40 years the saving rate has been below 10%, with a weird spike to 33% in April 2020 (https://fred.stlouisfed.org/series/PSAVERT). The national depreciation rate on fixed assets was about 5% of total capital in 2017 (https://fred.stlouisfed.org/series/M1TTOTL1ES000). Assuming $s=0.10$ and using $A=0.32$, the 2017 number from the plot above, we get $g_Y=0.1*0.32-0.05=-0.018$ or a negative 1.8%. All of these estimates except A are averages over the past few decades when growth has averaged about 3%. Using the 2017 estimate for A should have given us the highest possible growth rate.

Clearly, the basic AK model does not approximate the US economy very well at all. Are there national economies it better approximates? That’s a topic for another time. Additional topics for another time: how to incorporate human capital and intellectual capital. While human capital may have the diminishing returns we expect (physics expects!) from physical capital, there’s no reason for intellectual capital to exhibit diminishing returns. SPOILER ALERT: this may be a way to get a real economy to look a lot more like the AK model.

What’s the risk of COVID-19 in New Mexico?

I’m not a doctor (not in medicine, anyway) but I am something of a specialist in probability and statistics. These are my thoughts on a low-probability, high-impact event: COVID-19 infection.

Let’s start with historical data, but the important idea is that historical statistics are predictors of the past, not the future. They are based on a state of the universe that existed some time before the statistics were taken, and the universe has changed – a lot – since then. The numbers behind the statistics are growing, and the thing that matters – and we don’t know – is the growth rate of those numbers.

Consider the of population people already infected with COVID-19 (as we will in a couple of paragraphs). That number is changing all the time.

Popular media like to use the term “exponential growth” to mean something like “a lot”, but that’s not the point. If the growth rate is near zero, than it takes a very long time for the population to increase even with exponential growth. If the growth rate is high, then the population is increasing in size rapidly.

What is the risk of contracting COVID-19 in New Mexico based on historical statistics? How does it compare to neighboring Western states? If we assume completely random interactions across the whole state, here is the probability of exposure per interaction with another person based on infection rates on the morning of 15 March 2020

NM 0.00000621
OR 0.00000835
WA 0.00008055
CA 0.00000956
ID 0.00000114
UT 0.00000285
CO 0.00001791
AZ 0.00000167

Those are really small numbers, and kind of hard to visualize. What’s the biggest crowd I anticipate encountering? Less than 10,000 – I hope! So, in a crowd of 10,000, how many would be infected?

NM: less than two-thirds of a person
OR: slightly more than four-fifths of a person
WA: slightly more than eight people
CA: slightly less than one person
ID: slightly more than one-tenth of a person
UT: slightly more than one-fourth of a person
CO: a little less than two people
AZ: slightly more than one-sixth of a person

What we don’t know is the probability of infection given exposure, which is certainly less than 1.0. Worse case, assume 1.0, so these are also the risks of infection.

In reality, the probability of encountering the infection is much, much higher. Testing is only done on a very small fraction of individuals with symptoms, so there’s a lot more infected people out there. And an unknown number of carriers are asymptomatic. And while we’re at it, let’s take a closer look at what we mean by asymptomatic. Literally, it means “not showing symptoms”, but in this context it means “a thriving community of the virus that hasn’t made the host sick — yet.”

So what if the probability were ten times greater? Reduce the size of the crowd above to 1,000. Here in NM, if you want a better than 50/50 chance of avoiding infection, avoid crowds of 500 or more.

But these are all probabilities – your mileage will vary. You could meet only one person in a day but it could be the wrong person.

To reduce your exposure, stay home. When you do go out, avoid crowds. Even little crowds.

Assume you are exposed when you go out. These are ways to reduce your probability of infection:

  • Wash your hands when you get home.
  • Disinfect your cellphone before you wash your hands (you’re likely to have handled your cellphone while you were in a higher-risk environment like the supermarket). Don’t touch the disinfected phone before you wash your hands.
  • Avoid crowds and, more importantly, close proximity to others. Self-check-out at the supermarket looks better, doesn’t it? Not so much because there’s no checker opposite the register, but there’s no one standing right behind you in line, at least while you’re scanning.
  • Ordering online looks even better, no? DISINFECT ANY PACKAGES YOU RECEIVE. That package has been a lot of places and handled by a lot of people.

The Changing U.S. Economy

An assignment in my Spring 2018 senior-level macroeconomics course had my students look up the total capital expenditure and total labor expenditure by economic sector. Each student was assigned a different year. I gave them a spreadsheet that ordered the sectors by labor/capital ratio and plotted them in a unit box. While the students were presenting their results, it occurred to us all that it would make an interesting movie. I slapped something together at the time, but I just went back and did it right this time.

About the plots

Each year, the capital per sector is divided by total capital, so that all capital adds up to one. Similarly, the labor per sector is divided by total labor, so that all labor adds up to one. The sectors of the economy are sorted by slope (labor divided by capital). That way, their plot forms an upward curving arc from (0,0) to (1,1).

BLS (U.S. Bureau of Labor Statistics) has payroll data starting 1939, but from 1936 to 1946, it only tracks one sector: manufacturing. Starting in 1947, two more sectors were added: construction, and mining & logging. Not until 1964 was the service sector added, when it already accounted for 50% more in payrolls than manufacturing. Added at that time was a sector combining trade, transportation, and utilities, then they were split out in 1972. BLS never tracked farm payrolls – agriculture is largely taboo at BLS, it seems.

The capital data from BEA (U.S. Bureau of Economics Analysis) is remarkably consistent over the years. The sectors in BEA data are slightly different from the sectors in BLS data. It was easier to combine all agriculture and mining to put together with the mining & logging sector labor data, so the plots will always overstate capital in that sector. From 1964 to 1971, when BLS combined trade, transportation, and utilities, the capital data are combined for those sectors. The combined sector is represented by a line striped in the colors for the individual sectors.

Things to note

  • The plots from 1947 to 1961 are not directly comparable with the other years because there are large, missing sectors during that time and especially at the end.
  • The  curvature changes considerably between 1971 and 1972, when trade, transportation, and utilities were split out. The transportation sector moves to the low-labor end, the utilities sector stays about in the middle, and the trade sector moves up to the high-labor end.
  • Manufacturing is always becoming less steep, that is, less labor-intensive (more automated). This appears as a clockwise rotation of the manufacturing line.
  • The labor from manufacturing seems to go to construction between 1947 and 1963.
  • The service sector accounts for about 30% of labor in 1964. This grows steadily to about 80% in 2016.
  • Growth in the service sector is very rapid between 1972 and 2010.
  • The clockwise rotation of the manufacturing line is even more rapid  between 1964 and 1971 as the services sector takes an increasing share of labor.
  • The slope of the manufacturing line is relatively constant from 1972 to 2000, but resumes it’s clockwise rotation from 2001 to 2010.
  • The size and slope of the manufacturing line are pretty stable from 2011 to 2016.


Thanks to: Kirsten Andersen, Kristin Carl, Brittany Chacon, Yu Ting Chang, Andrew Detlefs, Kyle Dougherty, Thomas Henderson, Darren Ho, Jack Hodge, Alanis Jackson, Madeline Kee, Eric Knewitz, Jackson Kniebuehler, Phuong Mach, Abraham Maggard, Alexander Palm, Luisa Sanchez-Carrera, Teran Villa, Tsz Hong Yu

When fractional reserve banking gets pathological

If you own a macroeconomics textbook published before 2009, throw it away. Or rip out the section on fractional reserve banking. Because it’s wrong.

Implicit (or, rarely, explicit) in all the cheer-leading about how banks “create” money by holding only a fraction of deposits in reserve is the assumption that the money multiplier would forever be greater than 1.0.

Here’s the M1 money multiplier as tracked by the St. Louis Fed

The event that turned all your macroeconomics texts into pet-cage liner is pretty obvious right there in the middle of the gray bar that is the Great Recession: the multiplier fell off a cliff, descending from 1.618 in mid-September 2008 to 1.0 in mid-November 2008. It then did the unthinkable and fell below 1.0 where it remains as of this posting (0.913 with the most recent data from 30 August 2017). It dipped below 0.8 four times between 2010 and 2016. The multiplier languished below 0.8 for three years and seven months  in one stretch, falling to a minimum of 0.678 in mid August 2014.

Although FRED doesn’t provide this particular data set that far back, the M1 money multiplier stayed well above 1.0 during the Great Depression. And it took almost 35 years to recover from its minimum in early 1940. So, as they say, get used to it: the multiplier will remain low for a long time. But less than 1.0?

What were the banks doing all this time?

Those dips after the Great Recession are the result of a quantitative easing monetary policy implemented by the Federal Reserve in three waves between November 2008 and October 2014. Quantitative easing means that the central bank (Fed in this case) continues to push money into the monetary base even though interest rates are so low that banks don’t want to lend it out (thereby “creating” money, as your macro textbook says). Part of the problem was a less noticed but even more controversial Fed policy change at this time – paying interest on money that banks deposited with the Fed. More on this later.

So, back to the textbook story. The money multiplier is defined as

$$ M = mB $$


$$\begin{array}{rcl}M&=&\textsf{money supply}\\m&=&\textsf{money multiplier}\\B&=&\textsf{monetary base}\end{array}$$


$$m = \frac{M}{B}$$

For the simple model, the money supply is simply currency (C) plus demand deposits (D)  – the M1, basically

$$M = C + D$$

The monetary base – the money the Fed puts into the economy (or takes out) – either goes into circulation as currency (C) or held by banks as reserves (R)

$$B = C + R$$

Now we have

$$m = \frac{C + D}{C + R}$$

Dividing top and bottom by D

$$m=\frac{C/D + D/D}{C/D + R/D}$$

Make the substitutions

$$\begin{array}{rccl}pp&=&C/D&\textsf{public preference for currency versus demand deposits}\\bp&=&R/D&\textsf{bank preference for reserves versus demand deposits}\end{array}$$

so finally

$$m=\frac{pp + 1}{pp + bp}$$

Your macro textbook probably refers to bp as the reserve ratio – a central bank (Fed) monetary policy tool. First problem: your book also made the misleading statement that the Fed imposes a minimum reserve ratio and implied that banks would only ever keep that minimum in reserve. In reality, only a few large banks are subject to a minimum reserve ratio requirement and it’s, generally, not very much. Second problem: there is not maximum reserve ratio imposed on banks, so they could, in principle, put all their money in reserves and never make a single loan. Which is why quantitative easing became a necessity.

Let’s look at what happens when the money multiplier is less than 1.0

$$\begin{array}{rcl}\frac{pp + 1}{pp + bp}&<&1\\pp+1&<&pp+bp\\1&<&bp\end{array}$$

That is, it is the banks’ preference to hold more in reserves than they have in deposits. When the interest rate they can receive for making loans is around zero, and the Fed is paying interest on reserve deposits, is it any surprise?

The calculus of it all

Here’s an interesting explanation using marginal effects. First, note that the marginal effect of changing the monetary base is

$$\begin{array}{rcl}\frac{\partial M}{\partial B}&=&m\end{array}$$

So, when $m > 1$, increasing the monetary base gave the Fed more bang for the buck, but when $m=0.75$, the banks only put 75 cents of each of Fed’s dollar into circulation. The Fed’s dollars are our dollars, by the way.

Now consider the marginal effect of a change in the banks’ preference for reserves versus deposits

$$\begin{array}{rcl}\frac{\partial M}{\partial bp} & = & B\frac{\partial m}{\partial bp}\\& = & B\left[-\frac{pp+1}{\left(pp+bp\right)^{2}}\right]\\& = & -\frac{B}{pp+bp}\left[\frac{pp+1}{pp+bp}\right]\end{array}$$

The term in the square brackets is just the money multiplier, so

$$\frac{\partial M}{\partial bp} = -\frac{B}{pp+bp}m$$

Recall, however, that

$$\begin{array}{rcl}pp+bp & = & \frac{C}{D}+\frac{R}{D}\\& = & \frac{C+R}{D}\end{array}$$

and $$C + R = B$$ so

$$pp + bp = \frac{B}{D}$$


$$\frac{\partial M}{\partial bp} = -\frac{B}{B/D}m = -Dm$$

That makes sense: an increase in the banks’ preference for reserves results in a decrease of the money supply proportional to the money multiplier and the amount of demand deposits.

Now consider the marginal effect of a change in the public’s preference for currency versus deposits

$$\begin{array}{rcl}\frac{\partial M}{\partial bp} & = & B\frac{\partial m}{\partial bp}\\& = & B\left[\frac{1}{bp+pp}-\frac{bp+1}{\left(bp+pp\right)^{2}}\right]\\& = & \frac{B}{bp+pp}\left[1-\frac{bp+1}{bp+pp}\right]\\& = & \frac{B}{bp+pp}\left[1-m\right]\end{array}$$

As we saw before, the first term is just D. Reversing the subtraction in brackets gives us

$$\frac{\partial M}{\partial pp} = -D\left(m-1\right)$$

Now, as long as $m > 1$, this says that increased public preference for currency over deposits reduces the money supply proportional to D, but not quite as strong as increased preference for reserves by banks. That, too, makes sense. And, it’s almost certainly what your macro textbook says: increased preference for currency decreases the money multiplier.

But the post-apocalyptic scenario of $m < 1$ probably doesn’t appear in that macro textbook. That is, when the money multiplier is less than one, an increase in the public’s preference for currency actually increases the money supply. Not to anthropomorphize, but it’s like it’s telling us that, when $m < 1$, we can keep the banks from squirreling away all our money by holding on to it (or putting in the freezer, or under the mattress, or wherever).


Bilateral trade deficits and other nonsense

Paul Krugman (sort of) tweeted about the hoopla over the US trade deficit with Germany. Krugman points out that bilateral trade balance is irrelevant in a global economy because the global economy is a complex organism and individual relationships have to be considered in the bigger context. All true if you believe in the global economy – and the vast majority of American consumers do based on their shopping behavior. Then Krugman digs down into the nuts and bolts of the US trade relationship with Germany.

As Krugman points out, the trade relationship with any EU country is complicated, since goods arriving at any port within the EU could be destined for any national market within the union. Krugman speculates that the large US trade surpluses with Netherlands and Belgium represent goods ultimately consumed throughout the EU, including Germany.

With this thought in mind, I took the US Census balance of trade data, computed the 2016 bilateral net exports from the US (exports from the US minus imports to the US) for each EU country, and divided that by the 2016 population of each country. Here’s what I got


The thinking here is that if some countries act as ports for US trade with the whole EU, those countries should have disproportionately large trade deficits or surpluses with the US. For example, the overall US trade deficit with the EU is \$287 per person living in the EU. For Germany, it’s \$789 per person in Germany. Belgium and Netherlands have US trade surpluses of \$1348 and \$1427, respectively, per person in each of those countries. Luxembourg chips in another \$1648 per person, but the half-million people of Luxembourg are not going to turn around the US export economy any time soon.

These numbers support Krugman’s idea that Belgium and Netherlands are importers for the whole EU. And the US’s huge per capita trade deficit with Ireland probably reflects that Ireland is an exporter for the whole EU. The population of Ireland is a little less than one percent of the EU, however, so the size of that number is not as dramatic as it seems.

Social Security: the laws of thermodynamics still hold

You’ve heard all the hyperbole about Social Security going bankrupt, or broke, or whatever. If you still haven’t figured out how ridiculous that is, research pay-as-you-go plans. The system doesn’t run out of money unless the number of working people goes to zero. Chances are there will be far more serious things to worry about if that ever happens.

Here’s a very simple way to think about Social Security: every working person supports herself or himself along with some number of other people – dependents and people collecting Social Security. How many people? Less than two.

You may also have heard that, because of baby boomers (you know, the generation that made America rich but are now portrayed as economic parasites), the number of people collecting Social Security will soon far outnumber the number of working people. An incredibly improbable scenario, given what we know of the biology of the human species and, for that matter, the laws of thermodynamics.

But skipping over heat death analogies of our economic future, let’s just look at some numbers. Or, better yet, a graph made from some numbers. The numbers are from the Census Bureau’s 2014 population estimate.


The pair of lines at the bottom represent the ratio of retirees to workers over the next 44 years. Yes, it is increasing – from about 0.26 in 2012 to about 0.41 in 2060. That’s about 3 retirees for every 12 workers in 2012, to about 5 retirees for every 12 workers in 2060. This, by the way, is not the actual number of retirees, it’s the population aged 65 and above. Many of those people will continue to work, so this is a worse case scenario, basically.

In the meantime, American families are getting smaller at about the same rate, as shown in the pair of lines marked Dependents. The average number of dependents per worker will go from about 1.78 in 2012 to about 1.57 in 2060. When you add them together it means that a worker in 2012, who supported an average of 2.04 people, will be supporting 1.99 people by the year 2060. These are the lines at the top.

This is the culmination of a trend that started a long time ago. Multigenerational families are disappearing as retirees become increasingly self-supporting through both private retirement insurance (e.g. 401k) and public retirement insurance (Social Security). Households are seeing financial and social benefits of bearing less of the support for older family members at the same time that they are choosing to have smaller families. And, even while trading some of the costs of raising children for the costs of supporting parents, households will still manage to lower their overall burden in terms of the number of individuals supported.

And why are there two lines for each group in the graph? Before the 2008/2009 Great Recession, labor force participation was about 66% (down from an all time high of 67%). That is, 66% of Americans aged 16 and above were working or actively looking for work. Through the recession and after, labor force participation fell to nearly 62% before rebounding slightly.


There is some evidence that this new lower level of labor force participation is a structural change. That is, many American households have elected to step back from all adults earning full time incomes yet amassing tremendous consumer debt. Young families are opting for simpler lifestyles.

Returning to the dependents graph at the top of the page, the scenario where labor force participation rebounds to 66% is portrayed with dashed lines. And the scenario where labor force participation stays at 62% is shown with the solid lines. Either way, the laws of thermodynamics are not violated – we do not suddenly dissipate all the economic energy of 154 million American workers, nor do 40 million retirees create a black hole into which all that energy is sucked without a trace.

Math is too hard for Economics students?

The New Yorker ran an article about how math is too hard for Economics students.

My experience is that most Economics programs are not math-intensive. Maybe the top programs are – Harvard, Chicago, London School of Economics – because they can be. That is, a diploma from one of the top schools means you can do it all. If you can’t do it all, lower your expectations.

There are some math-phobic students (and professors) who want the high-prestige diplomas without the hard math. Piketty – like many celebrities – self-promotes by trivializing the institutions that got him where he is. He sounds a bit like someone in XKCD.

The reality is that there are some very important theories of Economics that require hard math. Will every economist need them in practice? No, not any more than most composers, computer scientists, or satellite engineers will need to use their foundational knowledge under normal circumstances. That’s not why they learn it: they learn it to be prepared for unusual circumstances.

The truth is that macroeconomics has a surfeit of competing explanations for almost everything – it doesn’t suffer a lack of out-of-the-box thinking. But it does suffer a shortage of sound ideas that can be tested theoretically and/or demonstrated through data. That’s where the hard math comes in.

Trends in personal savings

I was looking at some St. Louis Federal Reserve data in FRED to illustrate the relation between personal savings and interest rates. The savings data (GPSAVE) are pretty noisy, so I did a boxcar smooth over 21 quarters (5 years more or less) and plotted the quarterly change against 3-Month Treasury Bill: Secondary Market Rate (TB3MS).


I didn’t expect that sudden shift around 1981, from a quarterly increase of about 2.4% pre-1981 to a 1.2% increase post-1982.

What I did expect was that the quarterly change would reflect a combination of inflation and increasing population. I didn’t expect either of those to jump in 1981 (spike in interest rate or not) but, hey, it’s a complex system.

Then, Greg Mankiw, author of the fine textbook I’m using in Intermediate Macroeconomics, pointed out that I was using nominal savings rather than real savings. That is, I didn’t adjust the total dollars by inflation. So I did that, using Personal Consumption Expenditures: Chain-type Price Index (PCECTPI) from FRED. Here’s what my graph looks like using real savings:


Now, no sudden drop in 1981, but a change in slope around 1984/1985. It looks like, leading up to 1984, real savings increased each quarter, but the amount it increased was going down by 0.0101% per quarter. After 1985, however, the amount that real savings increased started going up by 0.0023% each quarter. The post-1985 trend, by the way, is hardly conclusive – less than 14% confidence in that number. In fact, it could be zero – meaning that the real savings has increased at a constant rate since 1985. Even if that is the case, it shows that real savings increase by 0.53% per quarter, while the US population increases by about 0.18% per quarter, based on Census Bureau estimate NST-EST2012-06.

Breusch-Pagan test for heteroskedasticity

The program below performs the Breusch-Pagan test step-by-step, then uses bptest from the lmtest package.

In order to use the built-in heteroskedasticity tests in R, you’ll have to install the lmtest package.



Problems, examples, and data from Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 4th ed. South-Western Pub. 2009.

Listing of the R program.

#  30 Oct 2012 D.S.Dixon
# Reproduces Example 8.4 in 
#  Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 4th ed. South-Western Pub. 2009.

# price     assess    bdrms     lotsize   sqrft     colonial  lprice    lassess  
# llotsize  lsqrft    
#   Obs:    88
#   1. price                    house price, $1000s
#   2. assess                   assessed value, $1000s
#   3. bdrms                    number of bedrooms
#   4. lotsize                  size of lot in square feet
#   5. sqrft                    size of house in square feet
#   6. colonial                 =1 if home is colonial style
#   7. lprice                   log(price)
#   8. lassess                  log(assess
#   9. llotsize                 log(lotsize)
#  10. lsqrft                   log(sqrft)


mydata <- read.table("hprice1.csv", sep=",",  header = TRUE, na.strings = ".")

formula0 <- mydata$lotsize + mydata$sqrft + mydata$bdrms 
myfit0 <- lm(price ~ lotsize + sqrft + bdrms, data=mydata)
output0 <- summary(myfit0)

# regress the residuals squared against the same model
myfit1 <- lm(I(myfit0$residuals^2) ~ lotsize + sqrft + bdrms, data=mydata)
output1 <- summary(myfit1)

# H0: u^2 does not depend on any parameter (e.g. all coefficients are statistically zero)
# This would be the F-test if it's homoskedastistic (which we don't know 'till after we
#   test it) so we use the LM test where LM ~ chi-squared(df = number of parameters in the model) 
Rsq <- output1$r.squared
N <- length(output0$residuals)
df <- length(myfit0$coefficients) - 1
LM <- N * Rsq
# P [X > x]
pvalue <- pchisq(LM, df, lower.tail = FALSE)

cat("LM = ", LM, "\n")
cat("df = ", df, "\n")
cat("p = ", pvalue, "\n")

# now use the canned version of Breusch-Pagan from the lmtest package

Heteroskedasticity-robust Standard Errors

The program below calculates the heteroskedasticity-robust standard errors.

In order to use the built-in heteroskedasticity tests in R, you’ll have to install the lmtest package.


Then, assuming you saved regression results to myfit0, to run the Breusch-Pagan test,



Problems, examples, and data from Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 4th ed. South-Western Pub. 2009.

Listing of the R program.

#  31 Oct 2012 D.S.Dixon
# Reproduces Example 8.2 in 
#  Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 4th ed. South-Western Pub. 2009.


mydata <- read.table("GPA3.csv", sep=",",  header = TRUE, na.strings = ".")
cleandata <- na.omit(mydata[mydata$spring == 1,c("cumgpa","sat","hsperc","tothrs","female","black","white")])

myfit0 <- lm(cumgpa~sat + hsperc + tothrs + female + black + white, data=cleandata)
output0 <- summary(myfit0)

# print the heteroskedasticity-robust standard errors
print(sqrt(diag(hccm(myfit0, type = "hc0"))))