Bilateral trade deficits and other nonsense

Paul Krugman (sort of) tweeted about the hoopla over the US trade deficit with Germany. Krugman points out that bilateral trade balance is irrelevant in a global economy because the global economy is a complex organism and individual relationships have to be considered in the bigger context. All true if you believe in the global economy – and the vast majority of American consumers do based on their shopping behavior. Then Krugman digs down into the nuts and bolts of the US trade relationship with Germany.

As Krugman points out, the trade relationship with any EU country is complicated, since goods arriving at any port within the EU could be destined for any national market within the union. Krugman speculates that the large US trade surpluses with Netherlands and Belgium represent goods ultimately consumed throughout the EU, including Germany.

With this thought in mind, I took the US Census balance of trade data, computed the 2016 bilateral net exports from the US (exports from the US minus imports to the US) for each EU country, and divided that by the 2016 population of each country. Here’s what I got


The thinking here is that if some countries act as ports for US trade with the whole EU, those countries should have disproportionately large trade deficits or surpluses with the US. For example, the overall US trade deficit with the EU is \$287 per person living in the EU. For Germany, it’s \$789 per person in Germany. Belgium and Netherlands have US trade surpluses of \$1348 and \$1427, respectively, per person in each of those countries. Luxembourg chips in another \$1648 per person, but the half-million people of Luxembourg are not going to turn around the US export economy any time soon.

These numbers support Krugman’s idea that Belgium and Netherlands are importers for the whole EU. And the US’s huge per capita trade deficit with Ireland probably reflects that Ireland is an exporter for the whole EU. The population of Ireland is a little less than one percent of the EU, however, so the size of that number is not as dramatic as it seems.

Social Security: the laws of thermodynamics still hold

You’ve heard all the hyperbole about Social Security going bankrupt, or broke, or whatever. If you still haven’t figured out how ridiculous that is, research pay-as-you-go plans. The system doesn’t run out of money unless the number of working people goes to zero. Chances are there will be far more serious things to worry about if that ever happens.

Here’s a very simple way to think about Social Security: every working person supports herself or himself along with some number of other people – dependents and people collecting Social Security. How many people? Less than two.

You may also have heard that, because of baby boomers (you know, the generation that made America rich but are now portrayed as economic parasites), the number of people collecting Social Security will soon far outnumber the number of working people. An incredibly improbable scenario, given what we know of the biology of the human species and, for that matter, the laws of thermodynamics.

But skipping over heat death analogies of our economic future, let’s just look at some numbers. Or, better yet, a graph made from some numbers. The numbers are from the Census Bureau’s 2014 population estimate.


The pair of lines at the bottom represent the ratio of retirees to workers over the next 44 years. Yes, it is increasing – from about 0.26 in 2012 to about 0.41 in 2060. That’s about 3 retirees for every 12 workers in 2012, to about 5 retirees for every 12 workers in 2060. This, by the way, is not the actual number of retirees, it’s the population aged 65 and above. Many of those people will continue to work, so this is a worse case scenario, basically.

In the meantime, American families are getting smaller at about the same rate, as shown in the pair of lines marked Dependents. The average number of dependents per worker will go from about 1.78 in 2012 to about 1.57 in 2060. When you add them together it means that a worker in 2012, who supported an average of 2.04 people, will be supporting 1.99 people by the year 2060. These are the lines at the top.

This is the culmination of a trend that started a long time ago. Multigenerational families are disappearing as retirees become increasingly self-supporting through both private retirement insurance (e.g. 401k) and public retirement insurance (Social Security). Households are seeing financial and social benefits of bearing less of the support for older family members at the same time that they are choosing to have smaller families. And, even while trading some of the costs of raising children for the costs of supporting parents, households will still manage to lower their overall burden in terms of the number of individuals supported.

And why are there two lines for each group in the graph? Before the 2008/2009 Great Recession, labor force participation was about 66% (down from an all time high of 67%). That is, 66% of Americans aged 16 and above were working or actively looking for work. Through the recession and after, labor force participation fell to nearly 62% before rebounding slightly.


There is some evidence that this new lower level of labor force participation is a structural change. That is, many American households have elected to step back from all adults earning full time incomes yet amassing tremendous consumer debt. Young families are opting for simpler lifestyles.

Returning to the dependents graph at the top of the page, the scenario where labor force participation rebounds to 66% is portrayed with dashed lines. And the scenario where labor force participation stays at 62% is shown with the solid lines. Either way, the laws of thermodynamics are not violated – we do not suddenly dissipate all the economic energy of 154 million American workers, nor do 40 million retirees create a black hole into which all that energy is sucked without a trace.

Math is too hard for Economics students?

The New Yorker ran an article about how math is too hard for Economics students.

My experience is that most Economics programs are not math-intensive. Maybe the top programs are – Harvard, Chicago, London School of Economics – because they can be. That is, a diploma from one of the top schools means you can do it all. If you can’t do it all, lower your expectations.

There are some math-phobic students (and professors) who want the high-prestige diplomas without the hard math. Piketty – like many celebrities – self-promotes by trivializing the institutions that got him where he is. He sounds a bit like someone in XKCD.

The reality is that there are some very important theories of Economics that require hard math. Will every economist need them in practice? No, not any more than most composers, computer scientists, or satellite engineers will need to use their foundational knowledge under normal circumstances. That’s not why they learn it: they learn it to be prepared for unusual circumstances.

The truth is that macroeconomics has a surfeit of competing explanations for almost everything – it doesn’t suffer a lack of out-of-the-box thinking. But it does suffer a shortage of sound ideas that can be tested theoretically and/or demonstrated through data. That’s where the hard math comes in.

Trends in personal savings

I was looking at some St. Louis Federal Reserve data in FRED to illustrate the relation between personal savings and interest rates. The savings data (GPSAVE) are pretty noisy, so I did a boxcar smooth over 21 quarters (5 years more or less) and plotted the quarterly change against 3-Month Treasury Bill: Secondary Market Rate (TB3MS).


I didn’t expect that sudden shift around 1981, from a quarterly increase of about 2.4% pre-1981 to a 1.2% increase post-1982.

What I did expect was that the quarterly change would reflect a combination of inflation and increasing population. I didn’t expect either of those to jump in 1981 (spike in interest rate or not) but, hey, it’s a complex system.

Then, Greg Mankiw, author of the fine textbook I’m using in Intermediate Macroeconomics, pointed out that I was using nominal savings rather than real savings. That is, I didn’t adjust the total dollars by inflation. So I did that, using Personal Consumption Expenditures: Chain-type Price Index (PCECTPI) from FRED. Here’s what my graph looks like using real savings:


Now, no sudden drop in 1981, but a change in slope around 1984/1985. It looks like, leading up to 1984, real savings increased each quarter, but the amount it increased was going down by 0.0101% per quarter. After 1985, however, the amount that real savings increased started going up by 0.0023% each quarter. The post-1985 trend, by the way, is hardly conclusive – less than 14% confidence in that number. In fact, it could be zero – meaning that the real savings has increased at a constant rate since 1985. Even if that is the case, it shows that real savings increase by 0.53% per quarter, while the US population increases by about 0.18% per quarter, based on Census Bureau estimate NST-EST2012-06.

Breusch-Pagan test for heteroskedasticity

The program below performs the Breusch-Pagan test step-by-step, then uses bptest from the lmtest package.

In order to use the built-in heteroskedasticity tests in R, you’ll have to install the lmtest package.



Problems, examples, and data from Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 4th ed. South-Western Pub. 2009.

Listing of the R program.

#  30 Oct 2012 D.S.Dixon
# Reproduces Example 8.4 in 
#  Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 4th ed. South-Western Pub. 2009.

# price     assess    bdrms     lotsize   sqrft     colonial  lprice    lassess  
# llotsize  lsqrft    
#   Obs:    88
#   1. price                    house price, $1000s
#   2. assess                   assessed value, $1000s
#   3. bdrms                    number of bedrooms
#   4. lotsize                  size of lot in square feet
#   5. sqrft                    size of house in square feet
#   6. colonial                 =1 if home is colonial style
#   7. lprice                   log(price)
#   8. lassess                  log(assess
#   9. llotsize                 log(lotsize)
#  10. lsqrft                   log(sqrft)


mydata <- read.table("hprice1.csv", sep=",",  header = TRUE, na.strings = ".")

formula0 <- mydata$lotsize + mydata$sqrft + mydata$bdrms 
myfit0 <- lm(price ~ lotsize + sqrft + bdrms, data=mydata)
output0 <- summary(myfit0)

# regress the residuals squared against the same model
myfit1 <- lm(I(myfit0$residuals^2) ~ lotsize + sqrft + bdrms, data=mydata)
output1 <- summary(myfit1)

# H0: u^2 does not depend on any parameter (e.g. all coefficients are statistically zero)
# This would be the F-test if it's homoskedastistic (which we don't know 'till after we
#   test it) so we use the LM test where LM ~ chi-squared(df = number of parameters in the model) 
Rsq <- output1$r.squared
N <- length(output0$residuals)
df <- length(myfit0$coefficients) - 1
LM <- N * Rsq
# P [X > x]
pvalue <- pchisq(LM, df, lower.tail = FALSE)

cat("LM = ", LM, "\n")
cat("df = ", df, "\n")
cat("p = ", pvalue, "\n")

# now use the canned version of Breusch-Pagan from the lmtest package

Heteroskedasticity-robust Standard Errors

The program below calculates the heteroskedasticity-robust standard errors.

In order to use the built-in heteroskedasticity tests in R, you’ll have to install the lmtest package.


Then, assuming you saved regression results to myfit0, to run the Breusch-Pagan test,



Problems, examples, and data from Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach, 4th ed. South-Western Pub. 2009.

Listing of the R program.

#  31 Oct 2012 D.S.Dixon
# Reproduces Example 8.2 in 
#  Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 4th ed. South-Western Pub. 2009.


mydata <- read.table("GPA3.csv", sep=",",  header = TRUE, na.strings = ".")
cleandata <- na.omit(mydata[mydata$spring == 1,c("cumgpa","sat","hsperc","tothrs","female","black","white")])

myfit0 <- lm(cumgpa~sat + hsperc + tothrs + female + black + white, data=cleandata)
output0 <- summary(myfit0)

# print the heteroskedasticity-robust standard errors
print(sqrt(diag(hccm(myfit0, type = "hc0"))))

Cost Function

The first part of the program below is an example of regressing data, plotting the data points and the fit,
and showing a few interesting points on the curve.

The second part of the program is an example of computing the ATC, AVC and MC curves and plotting those.

Note that MC could be computed discretely if the first derivative is not known. You would do this by calculating
the differences and plotting them against the midpoints.

mc <- TC[2:length(TC)] - TC[1:length(TC)-1]
dQ <- 0.5 + Q[1:length(Q) - 1]
plot(dQ, mc)

Listing of the R program.

#  28 Oct 2012 D.S.Dixon


Q  <- c(1:10)
TC <- c(193,226,240,244,257,260,274,297,350,420)

myfit0 <- lm(TC~Q+I(Q^2)+I(Q^3))
output0 <- summary(myfit0)

B0 <- myfit0$coefficients[1]
B1 <- myfit0$coefficients[2]
B2 <- myfit0$coefficients[3]
B3 <- myfit0$coefficients[4]

inflex.x <- - B2 / (3 * B3)
inflex.y <- predict(myfit0, newdata=data.frame(Q=inflex.x))
inflex.slope = B1 + 2 * B2 * inflex.x - 3 * B3 * inflex.x^2

cat("inflection at Q =",inflex.x,", TC =", inflex.y,"\n")

delta <- 0.01 * inflex.y
QA = - (B2 + delta) / (3 * B3)
QB = - (B2 - delta) / (3 * B3)

cat("QA = ",QA,", QB = ",QB,"\n")

plot(Q, TC)
points(c(inflex.x,inflex.x),c(inflex.y - 10,inflex.y + 10), pch=c(2,6))  
lines(c(inflex.x,inflex.x),c(inflex.y - 10,inflex.y + 10))  
points(c(QA,QB),predict(myfit0, data.frame(Q=c(QA,QB))), pch=4)  
legend(1,410,c("data","inflection","constant MC"),pch=c(1,2,4))

PQ <- predict(myfit0)
atc <- PQ / Q
avc <- (PQ - B0) / Q
# mc <- PQ[2:length(PQ)] - PQ[1:(length(PQ) - 1)]
# dQ <- .5 + PQ[1:(length(PQ) - 1)]
mc <- B1 + 2 * B2 * Q + 3 * B3 * Q^2

# lines(dQ,mc,lty=3)

Econometric equations in WordPress

Here’s a great way to get econometrics equations into your blog posts – you create a latex array environment and put each coefficient at the left of a new column. That way, you can get the standard errors to line up in the row below.

It ends up looking like this:

\begin{array}{llllll}  \widehat{log(wage)} = & 5.65\ + & 0.47\ educ\ + & .00078\ educ * pareduc\ + & .019\ exper\ + & .010\ tenure \\  & (.13) & (.010) & (.00021) & (.004) & (.003) \\  \multicolumn{3}{l}{n = 722, R^{2} = .169}  \end{array}

More on creating interaction dummies

Here’s the dummies created all in one line (plus a line to add the column labels)

dummies <- 1*outer(apply(data.matrix(cleandata[,c("female","athlete")]) * c(2,1),1,sum),c(3:0),"==")

Interaction dummies

Here’s a couple of quick and clean ways to add new interaction dummy variables to your data
matrix so you can regress them the same way as other columns in the matrix

# make a copy of your data, in case you make a mistake

newdata <- cleandata
# create new columns with the new dummy variables
newdata[,"female_athlete"]    <- 1*(cleandata$female == 1 & cleandata$athlete == 1)
newdata[,"female_nonathlete"] <- 1*(cleandata$female == 1 & cleandata$athlete == 0)
newdata[,"male_athlete"]      <- 1*(cleandata$female == 0 & cleandata$athlete == 1)
newdata[,"male_nonathlete"]   <- 1*(cleandata$female == 0 & cleandata$athlete == 0)

# outer is a little more exotic: it gets the outer equals ("==") product of female and the vector(1,1,0)
#  and ANDs (&) it with the outer equals product of athlete and the vector(1,0,1)
mat <- 1*(outer(cleandata$female, c(1,1,0), "==") & outer(cleandata$athlete, c(1,0,1), "=="))
colnames(mat) <- c("female_athlete", "female_nonathlete", "male_athlete")
newdata <- cbind(cleandata, mat)

# this does the same thing, but for all four interaction dummies
mat <- 1*(outer(cleandata$female, c(1,1,0,0), "==") & outer(cleandata$athlete, c(1,0,1,0), "=="))
colnames(mat) <- c("female_athlete", "female_nonathlete", "male_athlete", "male_nonathlete")
newdata <- cbind(cleandata, mat)