mercredi 9 août 2017

Model from a group of subseted dataframe

Hello I hope you can help me

I want to generate the best possible model for a data frame I have. As an example, I can use (iris). With a function I defined, I have generated 100 subsets of this data frame, each containing 100 rows.

data(iris)
foo <- function(dat, train_percent = 0.7) {
 n     <- seq_len(nrow(dat))
 train <- sample(n, floor(train_percent * max(n)))
 test  <- sample(setdiff(n, train))
 list(train = dat[train,], test = dat[test,])
  }

 replicate(100, foo(iris), simplify = FALSE)

Ideally, I would like to test the model I get the other remaining rows from the 50 data frames.

For a model example, I am trying to know if my final model should consider interactions or not. How can I link this model to the 100 subsets I generated?

model<-glm(iris$Sepal.Length ~ iris$Sepal.Width * iris$Species, 
family=poisson, data=iris)

My understanding is that if over the 100 subsets the interaction is significant my model should consider it. However, I understand that model construction should follow a stepwise method, but it seems complicated to do this 100 times. So I don´t fully understand if this is possible in R. I have seen this kind of procedure in SPSS and Maxent, but these programs follow other ways to test models.

I hope you can help me or give me some advice.

Aucun commentaire:

Enregistrer un commentaire