I am using the Hitters dataset in R. Currently I fit a linear regression predicting Salary from all other covariates with varying sample sizes from 20 to 75 and I calculated the average test/training errors :
data("Hitters", package = 'ISLR')
Hitters = na.omit(Hitters)
set.seed(1)
train.idx = sample(1:nrow(Hitters), 75,replace=FALSE)
train = Hitters[train.idx,-20]
test = Hitters[-train.idx,-20]
errs <- rep(NA,56)
for (ii in 20:75){
train.idx = sample(1:nrow(Hitters), ii,replace=FALSE)
train = Hitters[train.idx,-20]
test = Hitters[-train.idx,-20]
train.lm <- lm(Salary ~., - Salary, data = train)
train.pred <- predict(train.lm, train)
test.pred <- predict(train.lm, data = test)
errs[ii-19] <- mean((test.pred - train$Salary)^2)
}
errs
Now I am trying to do the same with Ridge regression using those samples I created from before with a regularization parameter of 20. I tried:
x_train = model.matrix(Salary~., train)[,-1]
x_test = model.matrix(Salary~., test)[,-1]
y_train = train$Salary
y_test = test$Salary
#cv.out = cv.glmnet(x_train,y_train, alpha = 0)
#lam = cv.out$lambda.min
errs.train <- rep(NA, 56)
for (ii in 20:75){
ridge_mod = glmnet(x_train, y_train, alpha=0, lambda = 20)
ridge_pred = predict(ridge_mod, newx = x_test)
#errs.test[ii] <- mean((ridge_pred - y_test)^2)
errs.train[ii-19] <- mean((ridge_pred - y_train)^2)
}
errs.train
But all the errors are coming out the same. How can I fix this?
Aucun commentaire:
Enregistrer un commentaire