# C5.2

**Use the data in GPS2.RAW for this exercise.**

**(i) Using all 4,137 observationsl, estimate the equation**

**and report theresults in the standard form.**

The regression results with the full data set are

**(ii) Reestimate the equation in part (i), using the first 2,070 observations.**

The regression results with the first 2070 points is

**(iii) Find the ratio of the standard errors on hsperc from parts (i) and (ii). Compare this with the results from (5.10).**

The ratio of the standard errors is

Equation (5.10)

where is a positive constant that should be approximately the same for any *n*. That is,

so that

For parts (i) and (ii)

This is within about 7.5% of the ratio of the standard errors.

For the full dataset, the constant is 0.0353.

To examine convergence, compare this value for with values from subsets of different sizes. That is,

regress this model on two sets of 2068 (half the dataset), then on three sets of 1379, four sets of 1034, five sets of a 827,

and ten sets of 413. From each regression, multiply the standard error for by the square root of the number

of samples and plot these against the number of samples. The plot is shown here, and the sample R code is shown below.

## References

Problem and data from Wooldridge *Introductory Econometrics: A Modern Approach, 4e*.

## Listing of the R program.

```
#
# 10 Oct 2012 D.S.Dixon
#
# GPA2.DES
#
# sat tothrs colgpa athlete verbmath hsize hsrank hsperc
# female white black hsizesq
#
# Obs: 4137
#
# 1. sat combined SAT score
# 2. tothrs total hours through fall semest
# 3. colgpa GPA after fall semester
# 4. athlete =1 if athlete
# 5. verbmath verbal/math SAT score
# 6. hsize size graduating class, 100s
# 7. hsrank rank in graduating class
# 8. hsperc high school percentile, from top
# 9. female =1 if female
# 10. white =1 if white
# 11. black =1 if black
# 12. hsizesq hsize^2
#
source("RegReportLibrary.R")
mydata <- read.table("gpa2.csv", sep=",", header = TRUE, na.strings = ".")
myfit0 <- lm(colgpa~hsperc + sat, data=mydata)
output0 <- summary(myfit0)
print(output0)
wordpressFormat(myfit0)
nfull <- length(output0$residuals)
sefull <- output0$coefficients[2,2]
myfit1 <- lm(colgpa~hsperc + sat, data=mydata[1:2070,])
output1 <- summary(myfit1)
print(output1)
wordpressFormat(myfit1)
separt <- output1$coefficients[2,2]
npart <- length(output1$residuals)
cat("ratio of standard errors: ",(sefull/separt),"\n")
cat("ratio of square root observations ",((npart/nfull)^0.5),"\n")
N <- length(mydata$colgpa)
N1 <- as.integer(N/10)
N2 <- as.integer(N/5)
N3 <- as.integer(N/4)
N4 <- as.integer(N/3)
N5 <- as.integer(N/2)
mat <- matrix(nrow=25,ncol=2)
row <- 1
sqrtn <- sqrt(N1)
for(i in 0:9){
start <- N1 * i
end <- N1 * (i + 1) - 1
output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,]))
mat[row,] <- c(N1,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}
sqrtn <- sqrt(N2)
for(i in 0:4){
start <- N2 * i
end <- N2 * (i + 1) - 1
output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,]))
mat[row,] <- c(N2,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}
sqrtn <- sqrt(N3)
for(i in 0:3){
start <- N3 * i
end <- N3 * (i + 1) - 1
output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,]))
mat[row,] <- c(N3,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}
sqrtn <- sqrt(N4)
for(i in 0:2){
start <- N4 * i
end <- N4 * (i + 1) - 1
output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,]))
mat[row,] <- c(N4,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}
sqrtn <- sqrt(N5)
for(i in 0:1){
start <- N5 * i
end <- N5 * (i + 1) - 1
output1 <- summary(lm(colgpa~hsperc + sat, data=mydata[start:end,]))
mat[row,] <- c(N5,output1$coefficients[2,2]*sqrtn)
row <- row + 1
}
sqrtn <- sqrt(N)
mat[row,] <- c(N,output0$coefficients[2,2]*sqrtn)
## make a plot of constants
png("f0d62b075365c15a9efcad7a9c046938.png")
plot(mat, xlab="number of points per dataset", ylab="stderr")
dev.off()
cat("c = ",output0$coefficients[2,2]*sqrtn,"\n")
```