C4.2
(i) Using the same model as Problem 3.4, state and test the null hypothesis that the rank of law schools has no ceteris paribus effect on median starting salary.
The median starting salary for new law school graduates is determined by
log(salary) = β0 + β1 LSAT + β2 GPA + β3 log(libvol) + β4 log(cost) + β5 rank + u
where LSAT is the median LSAT score for the graduating class, GPA is the median college GPA for the class, libvol is the number of volumes in the law school library, cost is the annual cost of atttending law school, and rank is a law school ranking (with rank = 1 being the best).
The null hypothesis that rank has no effect on log(salary) is
H0: β5 = 0
The alternative is
H1: β5 ≠ 0
which is a two-tailed test.
The regression results are:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.3432330 0.5325192 15.667 < 2e-16 ***
LSAT 0.0046964 0.0040105 1.171 0.24373
GPA 0.2475247 0.0900370 2.749 0.00683 **
llibvol 0.0949926 0.0332544 2.857 0.00499 **
lcost 0.0375544 0.0321061 1.170 0.24426
rank -0.0033246 0.0003485 -9.541 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1124 on 130 degrees of freedom
(20 observations deleted due to missingness)
Multiple R-squared: 0.8417, Adjusted R-squared: 0.8356
F-statistic: 138.2 on 5 and 130 DF, p-value: < 2.2e-16
The estimated model is
log(salary) = 8.34 + 0.00470 LSAT + 0.248 GPA + 0.0950 log(libvol) +
(0.53) (0.0040) (0.090) (0.033)
0.0376 log(cost) – 0.00332 rank
(0.032) (0.00035)
n = 136, Adj. R2 = 0.8356
Note from the regression results that the t-value is -9.541 which is highly significant. The critical value at one percent significance for 120 degrees of freedom is 2.617, and clearly |t|>>2.617. Thus we reject the null hypothesis.
(ii) Are features of the incoming class of students — namely, LSAT and GPA — individually or jointly significant for explaining salary? (Be sure to account for missing data on LSAT and GPA.)
Based on the t-value in the regression results, the coefficient on LSAT is not significant but GPA is significant at the 1% level, which would dominate any joint significance.
unrestricted SSR = 1.6427, df = 130
restricted SSR = 1.8942, df = 132
F(2,130) = 9.95, 1% critical value for F(2,inf) = 4.61
Thus, at the 1% level, we reject the null hypothesis that LCAT and GPA are jointly insignificant. From the R linearHypothesis test,
Linear hypothesis test
Hypothesis:
LSAT = 0
GPA = 0
Model 1: restricted model
Model 2: lsalary ~ LSAT + GPA + llibvol + lcost + rank
Res.Df RSS Df Sum of Sq F Pr(>F)
1 132 1.8942
2 130 1.6427 2 0.25151 9.9517 9.518e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here, at the 0.1% level, we reject the null hypothesis that LCAT and GPA are jointly insignificant.
(iii) Test whether incoming class size (clsize) or the size of the faculty (faculty) needs to be added to this equation; carry out a single test. (Be careful to account for missing data on clsize and faculty.)
To test the joint significance of these variables, first create a model that includes them. Then, do the test manually with an unrestricted regression of this model, then a restricted model that omits these two variables.
For the unrestricted model
unrestricted SSR = 1.5732, df = 123
restricted SSR = 1.5974, df = 125
F(2,123) = 0.9484, 10% critical value for F(2,120) = 2.35
Thus, at the 10% level, we fail to reject the null hypothesis that clsize and faculty are jointly insignificant. From the R linearHypothesis test,
Linear hypothesis test
Hypothesis:
clsize = 0
faculty = 0
Model 1: restricted model
Model 2: lsalary ~ LSAT + GPA + llibvol + lcost + rank + clsize + faculty
Res.Df RSS Df Sum of Sq F Pr(>F)
1 125 1.5974
2 123 1.5732 2 0.024259 0.9484 0.3902
Here, at the 39% level, we fail to reject the null hypothesis that clsize and faculty are jointly insignificant.
(iv) What factors might influence the rank of the law school that are not included in the salary regression?
There are salary differences based on gender and race/ethnicity across the labor market, and these may have some correlation with rank for a few law schools, but probably not over the entire data set. Individual programs are frequently ranked by the frequency and quality of publications by their faculty, so that is very likely to correlate with rank.
References
Problem and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e.
Listing of the R program.
#
# 1 Oct 2012 D.S.Dixon
#
#
# LAWSCH85.DES
#
# rank salary cost LSAT GPA libvol faculty age
# clsize north south east west lsalary studfac top10
# r11_25 r26_40 r41_60 llibvol lcost
#
# Obs: 156
#
# 1. rank law school ranking
# 2. salary median starting salary
# 3. cost law school cost
# 4. LSAT median LSAT score
# 5. GPA median college GPA
# 6. libvol no. volumes in lib., 1000s
# 7. faculty no. of faculty
# 8. age age of law sch., years
# 9. clsize size of entering class
# 10. north =1 if law sch in north
# 11. south =1 if law sch in south
# 12. east =1 if law sch in east
# 13. west =1 if law sch in west
# 14. lsalary log(salary)
# 15. studfac student-faculty ratio
# 16. top10 =1 if ranked in top 10
# 17. r11_25 =1 if ranked 11-25
# 18. r26_40 =1 if ranked 26-40
# 19. r41_60 =1 if ranked 41-60
# 20. llibvol log(libvol)
# 21. lcost log(cost)
#
mydata <- read.table("LAWSCH85.csv", sep=",", header = TRUE, na.strings = ".", )
print(summary(mydata))
# eliminate missing data in LSAT and GPA
cleandata<-mydata[!is.na(mydata$LSAT) & !is.na(mydata$GPA),]
# unrestricted model
myfit0<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank, data=cleandata)
fitsum0 <- summary(myfit0)
SSR0 <- deviance(myfit0)
df0 <- df.residual(myfit0)
print(fitsum0)
print(paste("n = ",length(myfit0$residuals)))
# restricted model (LSAT=0, GPA=0)
myfit0Rest<-lm(lsalary~llibvol+lcost+rank, data=cleandata)
fitsum0Rest <- summary(myfit0Rest)
SSR0Rest <- deviance(myfit0Rest)
df0Rest <- df.residual(myfit0Rest)
print(fitsum0Rest)
print(paste("n = ",length(myfit0Rest$residuals)))
print(paste("SSR0 = ",SSR0))
print(paste("SSR0Rest = ",SSR0Rest))
q <- df0Rest - df0
nk1 <- df0
print(paste("df0 = ",df0))
print(paste("df0Rest = ",df0Rest))
print(paste("q = ",q))
print(paste("nk1 = ",nk1))
F0 <- ((SSR0Rest - SSR0)/q)/(SSR0/nk1)
print("old school")
print(paste("F = ",F0))
library(car)
# joint significance of LSAT and GPA
hypmatrix <- rbind(c(0,1,0,0,0,0),c(0,0,1,0,0,0))
rhs <- c(0,0)
myfit<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank, data=mydata)
hyp <- linearHypothesis(myfit, hypmatrix, rhs)
print("new school")
print(hyp)
# now eliminate missing data from clsize and faculty
cleanerdata<-cleandata[!is.na(cleandata$clsize) & !is.na(cleandata$faculty),]
print(summary(cleanerdata))
# unrestricted model
myfit1<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank+clsize+faculty, data=cleanerdata)
fitsum1 <- summary(myfit1)
SSR1 <- deviance(myfit1)
df1 <- df.residual(myfit1)
# restricted model (clsize=0, faculty=0)
myfit1Rest<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank, data=cleanerdata)
fitsum1Rest <- summary(myfit1Rest)
SSR1Rest <- deviance(myfit1Rest)
df1Rest <- df.residual(myfit1Rest)
print("Testing clsize and faculty")
print(paste("n = ",length(myfit1Rest$residuals)))
print(paste("SSR1 = ",SSR1))
print(paste("SSR1Rest = ",SSR1Rest))
q <- df1Rest - df1
nk1 <- df1
print(paste("df1 = ",df1))
print(paste("df1Rest = ",df1Rest))
print(paste("q = ",q))
print(paste("nk1 = ",nk1))
F1 <- ((SSR1Rest - SSR1)/q)/(SSR1/nk1)
print("old school")
print(paste("F = ",F1))
library(car)
# joint significance of LSAT and GPA
hypmatrix <- rbind(c(0,0,0,0,0,0,1,0),c(0,0,0,0,0,0,0,1))
rhs <- c(0,0)
myfit<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank+clsize+faculty, data=mydata)
hyp <- linearHypothesis(myfit, hypmatrix, rhs)
print("new school")
print(hyp)