# C5.2

Use the data in GPS2.RAW for this exercise.

(i) Using all 4,137 observationsl, estimate the equation

$colgpa = \beta_{0} + \beta_{1} hsperc + \beta_{2} sat + u$

and report theresults in the standard form.

The regression results with the full data set are

$\begin{array}{llll} \widehat{colgpa} = & 1.392 & -0.01352\ hsperc & +0.001476\ sat\\ & (0.0715) & (0.000549) & (6.531) \\ \multicolumn{4}{l}{n = 4137, R^{2} = 0.273, Adjusted R^{2} = 0.273}\\ \end{array}$

(ii) Reestimate the equation in part (i), using the first 2,070 observations.

The regression results with the first 2070 points is

$\begin{array}{llll} \widehat{colgpa} = & 1.436 & -0.01275\ hsperc & +0.001468\ sat\\ & (0.0978) & (0.000719) & (8.858) \\ \multicolumn{4}{l}{n = 2070, R^{2} = 0.283, Adjusted R^{2} = 0.282}\\ \end{array}$

(iii) Find the ratio of the standard errors on hsperc from parts (i) and (ii). Compare this with the results from (5.10).

The ratio of the standard errors is

$0.000549/0.000719 = 0.764$

Equation (5.10)

$Se(\widehat{\beta_{j}}) \approx c_{j}/\sqrt{n}$

where $c_{j}$ is a positive constant that should be approximately the same for any n. That is,

$Se(\widehat{\beta_{0}}) \sqrt{n_{0}} \approx Se(\widehat{\beta_{1}}) \sqrt{n_{1}}$

so that

$Se(\widehat{\beta_{0}}) / Se(\widehat{\beta_{1}}) \approx \sqrt{n_{1}} / \sqrt{n_{0}}$

For parts (i) and (ii)

$\sqrt{2070} / \sqrt{4137} = 0.707$

This is within about 7.5% of the ratio of the standard errors.

For the full dataset, the constant $c_{j}$ is 0.0353.

To examine convergence, compare this value for $c_{j}$ with values from subsets of different sizes. That is,
regress this model on two sets of 2068 (half the dataset), then on three sets of 1379, four sets of 1034, five sets of a 827,
and ten sets of 413. From each regression, multiply the standard error for $\beta_{1}$ by the square root of the number
of samples and plot these against the number of samples. The plot is shown here, and the sample R code is shown below.

## References

Problem and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e.

## Listing of the R program.

#
#  10 Oct 2012 D.S.Dixon
#

# GPA2.DES
#
# sat       tothrs    colgpa    athlete   verbmath  hsize     hsrank    hsperc
# female    white     black     hsizesq
#
#   Obs:  4137
#
#   1. sat                      combined SAT score
#   2. tothrs                   total hours through fall semest
#   3. colgpa                   GPA after fall semester
#   4. athlete                  =1 if athlete
#   5. verbmath                 verbal/math SAT score
#   6. hsize                    size graduating class, 100s
#   7. hsrank                   rank in graduating class
#   8. hsperc                   high school percentile, from top
#   9. female                   =1 if female
#  10. white                    =1 if white
#  11. black                    =1 if black
#  12. hsizesq                  hsize^2
#

source("RegReportLibrary.R")

myfit0 <- lm(colgpa~hsperc + sat, data=mydata)
output0 <- summary(myfit0)
print(output0)
wordpressFormat(myfit0)

nfull <- length(output0$residuals) sefull <- output0$coefficients[2,2]

myfit1 <- lm(colgpa~hsperc + sat, data=mydata[1:2070,])
output1 <- summary(myfit1)
print(output1)
wordpressFormat(myfit1)

separt <- output1$coefficients[2,2] npart <- length(output1$residuals)

cat("ratio of standard errors: ",(sefull/separt),"\n")
cat("ratio of square root observations ",((npart/nfull)^0.5),"\n")

N <- length(mydata$colgpa) N1 <- as.integer(N/10) N2 <- as.integer(N/5) N3 <- as.integer(N/4) N4 <- as.integer(N/3) N5 <- as.integer(N/2) mat <- matrix(nrow=25,ncol=2) row <- 1 sqrtn <- sqrt(N1) for(i in 0:9){ start <- N1 * i end <- N1 * (i + 1) - 1 output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,])) mat[row,] <- c(N1,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}

sqrtn <- sqrt(N2)
for(i in 0:4){
start <- N2 * i
end <- N2 * (i + 1) - 1
output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,]))
mat[row,] <- c(N2,output1$coefficients[2,2]*sqrtn) row <- row + 1 } sqrtn <- sqrt(N3) for(i in 0:3){ start <- N3 * i end <- N3 * (i + 1) - 1 output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,])) mat[row,] <- c(N3,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}

sqrtn <- sqrt(N4)
for(i in 0:2){
start <- N4 * i
end <- N4 * (i + 1) - 1
output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,]))
mat[row,] <- c(N4,output1$coefficients[2,2]*sqrtn) row <- row + 1 } sqrtn <- sqrt(N5) for(i in 0:1){ start <- N5 * i end <- N5 * (i + 1) - 1 output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,])) mat[row,] <- c(N5,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}

sqrtn <- sqrt(N)
mat[row,] <- c(N,output0$coefficients[2,2]*sqrtn) ## make a plot of constants png("f0d62b075365c15a9efcad7a9c046938.png") plot(mat, xlab="number of points per dataset", ylab="stderr") dev.off() cat("c = ",output0$coefficients[2,2]*sqrtn,"\n")


# C4.2

### (i) Using the same model as Problem 3.4, state and test the null hypothesis that the rank of law schools has no ceteris paribus effect on median starting salary.

The median starting salary for new law school graduates is determined by

log(salary) = β0 + β1 LSAT + β2 GPA + β3 log(libvol) + β4 log(cost) + β5 rank + u

where LSAT is the median LSAT score for the graduating class, GPA is the median college GPA for the class, libvol is the number of volumes in the law school library, cost is the annual cost of atttending law school, and rank is a law school ranking (with rank = 1 being the best).

The null hypothesis that rank has no effect on log(salary) is

H0: β5 = 0

The alternative is

H1: β5 ≠ 0

which is a two-tailed test.

The regression results are:

    Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  8.3432330  0.5325192  15.667  < 2e-16 ***
LSAT         0.0046964  0.0040105   1.171  0.24373
GPA          0.2475247  0.0900370   2.749  0.00683 **
llibvol      0.0949926  0.0332544   2.857  0.00499 **
lcost        0.0375544  0.0321061   1.170  0.24426
rank        -0.0033246  0.0003485  -9.541  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1124 on 130 degrees of freedom
(20 observations deleted due to missingness)
Multiple R-squared: 0.8417, Adjusted R-squared: 0.8356
F-statistic: 138.2 on 5 and 130 DF,  p-value: < 2.2e-16


The estimated model is

log(salary) = 8.34 + 0.00470 LSAT + 0.248 GPA + 0.0950 log(libvol) +
(0.53)    (0.0040)                 (0.090)         (0.033)
0.0376 log(cost) – 0.00332 rank
(0.032)                 (0.00035)
n = 136, Adj. R2 = 0.8356

Note from the regression results that the t-value is -9.541 which is highly significant. The critical value at one percent significance for 120 degrees of freedom is 2.617, and clearly |t|>>2.617. Thus we reject the null hypothesis.

### (ii) Are features of the incoming class of students — namely, LSAT and GPA — individually or jointly significant for explaining salary? (Be sure to account for missing data on LSAT and GPA.)

Based on the t-value in the regression results, the coefficient on LSAT is not significant but GPA is significant at the 1% level, which would dominate any joint significance.

unrestricted SSR = 1.6427, df = 130
restricted SSR = 1.8942, df = 132
F(2,130) = 9.95, 1% critical value for F(2,inf) = 4.61


Thus, at the 1% level, we reject the null hypothesis that LCAT and GPA are jointly insignificant. From the R linearHypothesis test,

    Linear hypothesis test

Hypothesis:
LSAT = 0
GPA = 0

Model 1: restricted model
Model 2: lsalary ~ LSAT + GPA + llibvol + lcost + rank

Res.Df    RSS Df Sum of Sq      F    Pr(>F)
1    132 1.8942
2    130 1.6427  2   0.25151 9.9517 9.518e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Here, at the 0.1% level, we reject the null hypothesis that LCAT and GPA are jointly insignificant.

### (iii) Test whether incoming class size (clsize) or the size of the faculty (faculty) needs to be added to this equation; carry out a single test. (Be careful to account for missing data on clsize and faculty.)

To test the joint significance of these variables, first create a model that includes them. Then, do the test manually with an unrestricted regression of this model, then a restricted model that omits these two variables.

For the unrestricted model

unrestricted SSR = 1.5732, df = 123
restricted SSR = 1.5974, df = 125
F(2,123) = 0.9484, 10% critical value for F(2,120) = 2.35


Thus, at the 10% level, we fail to reject the null hypothesis that clsize and faculty are jointly insignificant. From the R linearHypothesis test,

    Linear hypothesis test

Hypothesis:
clsize = 0
faculty = 0

Model 1: restricted model
Model 2: lsalary ~ LSAT + GPA + llibvol + lcost + rank + clsize + faculty

Res.Df    RSS Df Sum of Sq      F Pr(>F)
1    125 1.5974
2    123 1.5732  2  0.024259 0.9484 0.3902


Here, at the 39% level, we fail to reject the null hypothesis that clsize and faculty are jointly insignificant.

### (iv) What factors might influence the rank of the law school that are not included in the salary regression?

There are salary differences based on gender and race/ethnicity across the labor market, and these may have some correlation with rank for a few law schools, but probably not over the entire data set. Individual programs are frequently ranked by the frequency and quality of publications by their faculty, so that is very likely to correlate with rank.

## References

Problem and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e.

## Listing of the R program.

#
#  1 Oct 2012 D.S.Dixon
#
#
# LAWSCH85.DES
#
# rank      salary    cost      LSAT      GPA       libvol    faculty   age
# clsize    north     south     east      west      lsalary   studfac   top10
# r11_25    r26_40    r41_60    llibvol   lcost
#
#   Obs:   156
#
#   1. rank                     law school ranking
#   2. salary                   median starting salary
#   3. cost                     law school cost
#   4. LSAT                     median LSAT score
#   5. GPA                      median college GPA
#   6. libvol                   no. volumes in lib., 1000s
#   7. faculty                  no. of faculty
#   8. age                      age of law sch., years
#   9. clsize                   size of entering class
#  10. north                    =1 if law sch in north
#  11. south                    =1 if law sch in south
#  12. east                     =1 if law sch in east
#  13. west                     =1 if law sch in west
#  14. lsalary                  log(salary)
#  15. studfac                  student-faculty ratio
#  16. top10                    =1 if ranked in top 10
#  17. r11_25                   =1 if ranked 11-25
#  18. r26_40                   =1 if ranked 26-40
#  19. r41_60                   =1 if ranked 41-60
#  20. llibvol                  log(libvol)
#  21. lcost                    log(cost)
#

print(summary(mydata))

# eliminate missing data in LSAT and GPA
cleandata<-mydata[!is.na(mydata$LSAT) & !is.na(mydata$GPA),]

# unrestricted model
myfit0<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank, data=cleandata)
fitsum0 <- summary(myfit0)

SSR0 <- deviance(myfit0)
df0 <- df.residual(myfit0)

print(fitsum0)
print(paste("n = ",length(myfit0$residuals))) # restricted model (LSAT=0, GPA=0) myfit0Rest<-lm(lsalary~llibvol+lcost+rank, data=cleandata) fitsum0Rest <- summary(myfit0Rest) SSR0Rest <- deviance(myfit0Rest) df0Rest <- df.residual(myfit0Rest) print(fitsum0Rest) print(paste("n = ",length(myfit0Rest$residuals)))

print(paste("SSR0 = ",SSR0))
print(paste("SSR0Rest = ",SSR0Rest))
q <- df0Rest - df0
nk1 <- df0
print(paste("df0 = ",df0))
print(paste("df0Rest = ",df0Rest))
print(paste("q = ",q))
print(paste("nk1 = ",nk1))
F0 <- ((SSR0Rest - SSR0)/q)/(SSR0/nk1)
print("old school")
print(paste("F = ",F0))

library(car)

# joint significance of LSAT and GPA
hypmatrix <- rbind(c(0,1,0,0,0,0),c(0,0,1,0,0,0))
rhs <- c(0,0)

myfit<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank, data=mydata)
hyp <- linearHypothesis(myfit, hypmatrix, rhs)

print("new school")
print(hyp)

# now eliminate missing data from clsize and faculty
cleanerdata<-cleandata[!is.na(cleandata$clsize) & !is.na(cleandata$faculty),]

print(summary(cleanerdata))

# unrestricted model
myfit1<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank+clsize+faculty, data=cleanerdata)
fitsum1 <- summary(myfit1)
SSR1 <- deviance(myfit1)
df1 <- df.residual(myfit1)

# restricted model (clsize=0, faculty=0)
myfit1Rest<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank, data=cleanerdata)
fitsum1Rest <- summary(myfit1Rest)
SSR1Rest <- deviance(myfit1Rest)
df1Rest <- df.residual(myfit1Rest)

print("Testing clsize and faculty")

print(paste("n = ",length(myfit1Rest$residuals))) print(paste("SSR1 = ",SSR1)) print(paste("SSR1Rest = ",SSR1Rest)) q <- df1Rest - df1 nk1 <- df1 print(paste("df1 = ",df1)) print(paste("df1Rest = ",df1Rest)) print(paste("q = ",q)) print(paste("nk1 = ",nk1)) F1 <- ((SSR1Rest - SSR1)/q)/(SSR1/nk1) print("old school") print(paste("F = ",F1)) library(car) # joint significance of LSAT and GPA hypmatrix <- rbind(c(0,0,0,0,0,0,1,0),c(0,0,0,0,0,0,0,1)) rhs <- c(0,0) myfit<-lm(lsalary~LSAT+GPA+llibvol+lcost+rank+clsize+faculty, data=mydata) hyp <- linearHypothesis(myfit, hypmatrix, rhs) print("new school") print(hyp)  # Homework 7 # 4.2 ### Consider an equation to explain salaries of CEOs in terms of annual firm sales, return on equity (roe, in perentage form), and return the the firm’s stock (ros, in percentage form): log(salary) = β0 + β1 log(sales) + β2 roe + β3 ros + u ### (i) In terms of the model parameters, state the null hypothesis that, after controlling for sales and roe, ros has no effect on CEO salary. State the alternative that better stock market performance increase a CEO’s salary. The null hypothesis that ros has no effect is H0: β3 = 0 The alternative that better stock market performance increases a CEO’s salary is H1: β3 > 0 That is, a one-tailed test. ### (ii) Using the data in CEOSAL1.RAW, the following equation was obtained by OLS: log(salary) = 4.32 + .280 log(sales) + .0174 roe + .00024 ros (.32) (.035) (.0041) (.00054) n = 206, R2 = .283 By what percetage is salary predicted to increase if ros increases by 50 points? Does ros have a practically large effect on salary? For an increase of ros by 50 points, the proportional effect on salary is .00024(50) = 0.012, or 1.2%. Practically, this is a very small change in salary for a very dramatic change in stock performance. ### (iii) Test the null hypothesis that ros has no effect on salary against the alternative that ros has a positive effect. Carry out the test at the 10% signficance level. From Table G.2, the 10% critical value for a one-tailed test with infinite degrees of freedom is 1.282. The t-statistic of ros is .00024/.00054 = 0.44, which is much less than the critical value. Thus, we fail to reject the null hypothesis at the 10% significance level. ### (iv) Would you include ros in a final model explaining CEO compensation in terms of firm performance? Explain. I would include it. Since the other variables are highly significant, it is unlikely that including ros is having any negative impact. Many readers will assume that ros affects CEO salary, so addressing the question, even if a bit ambiguously, is instructive. ## References Problem and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e. # Homework 6 # 3.12 ### The following equation represents the effects of tax revenue mix on subsequent employment growth for a population of counties in the United States: growth = β0 + β1 shareP + β2 shareI + β3 shareS + other factors where growth is the percentage change in employment from 1980 to 1990, shareP is the share of property taxes in total tax revenue, shareI is the share of income tax revenues, and shareS is the share of sales tax revenues. All of these variables are measured in 1980. The omitted share, shareF, includes fees and miscellaeous taxes. By definition, the four shares add up to one. Other factors would include expenditure on education, infrastructure, and so on (all measured in 1980). ### (i) Why must we omit one of the tax share variables from the equation? Because all of the shares add up to one, specifying three of the shares unambiguously specifies the fourth. Thus the fourth is not statistically independent, as required by Gauss-Markov. Note that varying any one of the included three shares ceteris paribus means, by definition, that the omitted share is also being varied. Were it to be included, there would be no way to vary any of the shares ceteris paribus, and therefore no way to interpret the coefficients. ### (ii) Give a careful interpretation of β1. β1 is the property-tax-share marginal change in percent employment. That is, the percent by which employment changes per unit change in property tax share. A unit change in share makes no sense however, since it must be between zero and one. The quantity (0.01)β1, however, is the percent change in employment per change in property tax as a percent of total taxes. Note that a ceteris paribus increase in shareP by one percent means, necessarily, a decrease in shareF by one percent. # C3.6 ### Use the data set in WAGE2.RAW for this problem. As usual, be sure all of the following regressions contain an intercept. ### (i) Run a simple regression of IQ on educ to obtain the slope coefficient, say δ1. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 53.6872 2.6229 20.47 <2e-16 *** educ 3.5338 0.1922 18.39 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 12.9 on 933 degrees of freedom Multiple R-squared: 0.2659, Adjusted R-squared: 0.2652 F-statistic: 338 on 1 and 933 DF, p-value: < 2.2e-16  The estimated model is IQ = 53.6872 + 3.5338 educ so δ1 = 3.5338 ### (ii) Run the simple regression of log(wage) on ecud, and obtain the slope coefficient, β1.  Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.973062 0.081374 73.40 <2e-16 *** educ 0.059839 0.005963 10.04 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4003 on 933 degrees of freedom Multiple R-squared: 0.09742, Adjusted R-squared: 0.09645 F-statistic: 100.7 on 1 and 933 DF, p-value: < 2.2e-16  The estimated model is ln(wage) = 5.9731 + 0.059839 educ so β1 = 0.059839 ### (iii) Run the multiple regression of log(wage) on educ and IQ, and obtain the slope coefficients, β1 and β2, respectively.  Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.6582876 0.0962408 58.793 < 2e-16 *** educ 0.0391199 0.0068382 5.721 1.43e-08 *** IQ 0.0058631 0.0009979 5.875 5.87e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3933 on 932 degrees of freedom Multiple R-squared: 0.1297, Adjusted R-squared: 0.1278 F-statistic: 69.42 on 2 and 932 DF, p-value: < 2.2e-16  The estimated model is ln(wage) = 5.6582876 + 0.039120 educ + 0.0058631 IQ so β1 = 0.039120 β2 = 0.0058631 ### (iv) Verify that β1 = β1 + β2δ1. β1 = β1 + β2δ1 = 0.039120 + (0.0058631)(3.5338) = 0.059839 identical with the result in part (ii). ## References Problem and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e. ### Listing of the R program used to answer these questions. # # 1 Oct 2012 D.S.Dixon # # # WAGE2.DES # # wage hours IQ KWW educ exper tenure age # married black south urban sibs brthord meduc feduc # lwage # # Obs: 935 # # 1. wage monthly earnings # 2. hours average weekly hours # 3. IQ IQ score # 4. KWW knowledge of world work score # 5. educ years of education # 6. exper years of work experience # 7. tenure years with current employer # 8. age age in years # 9. married =1 if married # 10. black =1 if black # 11. south =1 if live in south # 12. urban =1 if live in SMSA # 13. sibs number of siblings # 14. brthord birth order # 15. meduc mother's education # 16. feduc father's education # 17. lwage natural log of wage # mydata <- read.table("WAGE2.csv", sep=",", header = TRUE, na.strings = ".") print(summary(mydata)) myfit0<-lm(IQ~educ, data=mydata) print(summary(myfit0)) myfit1<-lm(lwage~educ, data=mydata) print(summary(myfit1)) myfit2<-lm(lwage~educ+IQ, data=mydata) print(summary(myfit2))  # Homework 5 # 3.4 ### The median starting salary for new law school graduates is determined by log(salary) = β0 + β1 LSAT + β2 GPA + β3 log(libvol) + β4 log(cost) + β5 rank + u where LSAT is the median LSAT score for the graduating class, GPA is the median college GPA for the class, libvol is the number of volumes in the law school library, cost is the annual cost of atttending law school, and rank is a law school ranking (with rank = 1 being the best). ### (i) Explain why we expect β5 ≤ 0. The lower the rank the higher the perceived quality, so we expect the marginal benefit of rank to be negative. ### (ii) What signs do you expect for the other slope parameters? Justify your answers. Ceteris paribus, better students get better salaries, so positive values are expected for cofficients on LSAT and GPA1 and β2, respectively). Ceteris paribus, graduates from better schools get better salaries. Positive values are expected for cofficients on log(libvol) and cost3 and β4, respectively) as they are proxies for overall school quality. ### (iii) Using the data in LAWSCH85.RAW, the estimated equation is log(salary) = 8.34 + .0047 LSAT + .248 GPA + .095 log(libvol) + .038 log(cost) – .0033 rank n = 136, R2 = .842 What is the predicted ceteris paribus difference in salary for schools with a median GPA different by one point? (Report your answer as a percentage.) The coefficient on GPA is .248, meaning that, ceteris paribus, a one point difference in GPA is predicted to result in a 24.8% change in starting salary. ### (iv) Interpret the coefficient on the variable log(libvol). The cofficient on log(libvol) is the library-volume elasticity of salary. That is, cetris paribus, a one percent change in the number of volumes in the library is predicted to result in a 0.095% change in starting salary. ### (v) Would you say it is better to attend a higher ranked law school? How much is a difference in ranking of 20 worth in terms of predicted starting salary? For “better” measured as starting salary, there is a strong correlation between rank and log(salary). A -20 position difference in rank, ceteris paribus, is predicted to result in a Δlog(salary) = -0.0033 (-20) = .066 or a 6.6% higher starting salary. # C3.4 ### (i) Obtain the minimum, maximum, and average values for the variables atndrte, priGPA, and ACT. column minimum maximum mean atndrte 6.25 100.00 81.71 priGPA 0.857 3.930 2.587 ACT 13.00 32.00 22.51 ### (ii) Estimate the model atndrte = β0 + β1 priGPA + β2 ACT + u and write the results in equation form. Interpret the intercept. Does it have a useful meaning? Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 75.700 3.884 19.49 <2e-16 *** priGPA 17.261 1.083 15.94 <2e-16 *** ACT -1.717 0.169 -10.16 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 14.38 on 677 degrees of freedom Multiple R-squared: 0.2906, Adjusted R-squared: 0.2885 F-statistic: 138.7 on 2 and 677 DF, p-value: < 2.2e-16  The estimated model is atndrte = 75.7 + 17.26 priGPA – 1.717 ACT The intercept is the attendance rate for a student with zero GPA and zero ACT score. Neither of these is likely. Furthermore, the data set has no values anywhere near these, as shown in part (i) above. ### (iii) Discuss the estimated slope coefficients. Are there any surprises? All of the cofficients are significant at any level. The cofficient on priGPA is large and positive, indicating a strong correlation between grades and attendance. The cofficient on ACT is about the same magnitude (given the ranges of priGPA and ACT) yet negaive. This indicates that a test of aptitude taken a year or two before college is negatively correlated with attendance. There are many possible interpretations, including that the students with higher aptitude don’t need to attend classes as frequently, or that talented high school students are lazy college students, or that talent goes away after high school graduation. ### (iv) What is the predicted atndrte if priGPA = 3.65 and ACT = 20? What do you make of this result? Are there any students in the sample with these values of the explanatory variables? The predicted atndrte is atndrte = 75.7 + 17.26 (3.65) – 1.717 (20) = 104.3 Given that this represents an attendance rate greater than one hundred percent, it can be interpreted as being within the residuals for attendance of 100%. There is one student in the data set with priGPA = 3.65 and ACT = 20, and that student has atndrte = 87.5. ### (v) If Student A has priGPA = 3.65 and ACT = 21 and Student B has priGPA = 2.1 and ACT = 26, what is the predicted difference in their attendance rates? The difference is Δatndrte = 17.26 (3.1 – 2.1) – 1.717 (21 – 26) = 25.8 ## References Problems and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e. ### Listing of the R program used to answer these questions. # # 25 Sep 2012 D.S.Dixon # # # ATTEND.DES # # attend termGPA priGPA ACT final atndrte hwrte frosh # soph skipped stndfnl # # # Obs: 680 # # 1. attend classes attended out of 32 # 2. termGPA GPA for term # 3. priGPA cumulative GPA prior to term # 4. ACT ACT score # 5. final final exam score # 6. atndrte percent classes attended # 7. hwrte percent homework turned in # 8. frosh =1 if freshman # 9. soph =1 if sophomore # 10. skipped number of classes skipped # 11. stndfnl (final - mean)/sd # mydata <- read.table("attend.csv", sep=",", header = TRUE, na.strings = ".") print("atndrte:") print(summary(mydata$atndrte))
print("priGPA:")
print(summary(mydata$priGPA)) print("ACT:") print(summary(mydata$ACT))

myfit<-lm(atndrte~priGPA+ACT, data=mydata)

print(summary(myfit))

print(predict(myfit,data.frame(priGPA=3.65,ACT=20)))

print(mydata[mydata$priGPA==3.65 & mydata$ACT==20,])

print(predict(myfit,data.frame(priGPA=3.1,ACT=21)))

print(predict(myfit,data.frame(priGPA=2.1,ACT=26)))


# Problem C2.6

### (i) Do you think each additional dollar spent as the same effect on the pass rate, or does a diminishing effect seem more appropriate? Explain.

With cross-sectional data covering a wide range of spending levels, ceteris paribus, an additional dollar spent at a school in an upper-middle-class neighborhood is likely to have much less impact than a dollar spent at a low-income school. This argues for a diminishing effect of dollars spent on math test pass rate.

### (ii) In the population model math10 = β0 + β1 ln(expend) + u argue that β1/10 is the percentage point change in math10 given a 10% increase in expend.

Note that

Δmath10 = β1 Δln(expend) ≈ β1 (Δexpend/expend)

so if

Δexpend/expend = 10% = 1/10

then

%Δmath10 = 10 → Δmath10 = β1/10

### (iii) Use the data in MEAP93.RAW to estimate the model from part (ii). Report the estimated equation in the usual way, including the sample size and R-squared.

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -69.341     26.530  -2.614 0.009290 **
lexpend       11.164      3.169   3.523 0.000475 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.35 on 406 degrees of freedom
Multiple R-squared: 0.02966,    Adjusted R-squared: 0.02727
F-statistic: 12.41 on 1 and 406 DF,  p-value: 0.0004752

N = 408


The estimated equation is

math10 = -69.3 + 11.16 ln(expend)
n = 408, R2 = 0.0297

### (iv) How big is the estimated spending effect? Namely, if spending increases by 10%, what is the estimated percentage point increase in math10?

For a 10% increase in spending, there will be a 1.1% increase in math10.

### (v) One might worry that regression analysis can produce fitted values for math10 that are greater than 100. Why is this not mucy of a worry in this data set?

The summary for lexpend is

Min.   :8.111
1st Qu.:8.248
Median :8.330
Mean   :8.370
3rd Qu.:8.447
Max.   :8.912


so that even for the highest value of lexpend

math10 = -69.3 + 11.16 (8.912) = 30.2

That is, the highest predicted score is slightly more than 30%. Similary, the lowest predicted score is

math10 = -69.3 + 11.16 (8.111) = 21.2

## References

Problems and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e.

### Listing of the R program used to answer these questions.

#
#  30 Sep 2012 D.S.Dixon
#
# MEAP93.DES
#
# lnchprg   enroll    staff     expend    salary    benefits  droprate
# gradrate math10    sci11     totcomp   ltotcomp  lexpend   lenroll   lstaff    bensal
# lsalary
#
#   Obs:   408
#
#   1. lnchprg                  perc. of studs. in sch. lunch prog.
#   2. enroll                   school enrollment
#   3. staff                    staff per 1000 students
#   4. expend                   expend. per stud., $# 5. salary avg. teacher salary,$
#   6. benefits                 avg. teacher benefits, $# 7. droprate school dropout rate, perc # 8. gradrate school graduation rate, perc # 9. math10 perc studs passing MEAP math # 10. sci11 perc studs passing MEAP science # 11. totcomp salary + benefits # 12. ltotcomp log(totcomp) # 13. lexpend log of expend # 14. lenroll log(enroll) # 15. lstaff log(staff) # 16. bensal benefits/salary # 17. lsalary log(salary) # mydata <- read.table("MEAP93.csv", sep=",", header = TRUE, na.strings = ".") myfit<-lm(math10~lexpend, data=mydata) print(summary(myfit)) print(paste("N =",length(myfit$residuals)))

print(summary(mydata))


# Problem C2.4

## References

Problems and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e.

### Listing of the R program used to answer these questions.

#
#  17 Sep 2012 D.S.Dixon
#
#
#
#  Obs:   935
#
#  1. wage                     monthly earnings
#  2. hours                    average weekly hours
#  3. IQ                       IQ score
#  4. KWW                      knowledge of world work score
#  5. educ                     years of education
#  6. exper                    years of work experience
#  7. tenure                   years with current employer
#  8. age                      age in years
#  9. married                  =1 if married
# 10. black                    =1 if black
# 11. south                    =1 if live in south
# 12. urban                    =1 if live in SMSA
# 13. sibs                     number of siblings
# 14. brthord                  birth order
# 15. meduc                    mother's education
# 16. feduc                    father's education
# 17. lwage                    natural log of wage
#

print(summary(mydata))

print(sd(mydata$IQ)) ## make a histogram of wage png("682eeb7d6fc88c363ace4cc5d2ddcd3a.png") hist(mydata$wage,breaks=50)
dev.off()

## make a histogram of IQ
png("5c6185bdce0b6fbb6191ecf199d18ea3.png")

## References

Problems and data from Wooldridge Introductory Econometrics: A Modern Approach, 4e.

## R Program

Here’s the R program I used to answer these questions

#
#  28 Aug 2012 D.S.Dixon
#
# This is the dataset for the first EC 460 homework assignment
#

# the description from is from BWGHT.DES
#  1. faminc                   1988 family income, $1000s # 2. cigtax cig. tax in home state, 1988 # 3. cigprice cig. price in home state, 1988 # 4. bwght birth weight, ounces # 5. fatheduc father's yrs of educ # 6. motheduc mother's yrs of educ # 7. parity birth order of child # 8. male =1 if male child # 9. white =1 if white # 10. cigs cigs smked per day while preg # 11. lbwght log of bwght # 12. bwghtlbs birth weight, pounds # 13. packs packs smked per day while preg # 14. lfaminc log(faminc) # mydata <- read.table("BWGHT.raw", header = FALSE, na.strings = ".", col.names=c( "faminc", "cigtax", "cigprice", "bwght", "fatheduc", "motheduc", "parity", "male", "white", "cigs", "lbwght", "bwghtlbs", "packs", "lfaminc")) print(paste("There are ", length(mydata$bwght), " samples in the data"))

print(paste("The mean number of cigarettes per day is ", mean(mydata$cigs))) ## make a histogram of cigarettes per day png("8990a8170515ef9b5730fe9573cf4d6c.png") hist(mydata$cigs,breaks=50)
dev.off()

print(paste("There are ", length(mydata$cigs[mydata$cigs>0]), " smokers in the sample data"))

print(paste("Considering only smokers, the mean number  of cigarettes per day is ", mean(mydata$cigs[mydata$cigs>0])))

print(paste("The mean of fatheduc is ", mean(na.omit(mydata$fatheduc)))) print(paste("There are ", (length(mydata$bwght) - length(na.omit(mydata$fatheduc)))," samples with missing fatheduc data")) print(paste("The mean of faminc is ", mean(mydata$faminc)))
print(paste("The standard deviation of faminc is ", sd(mydata$faminc))) ## make a histogram of family income png("d4f18cdd2204524640311f5ab124e086.png") hist(mydata$faminc,breaks=50)
dev.off()