I know this question is quite common but I have looked at all the questions that have been asked before and I still can't understand why we also need a validation set. I know sometimes people only use a train set and a test set, so why do we also need a validation set? And how do we use it? For example, in order to impute missing data, I impute these 3 different sets separately or not? Also, is it okay to just split my data with indexing (for example, in a dataset with 100000 rows, I just do this:
train <- data[1:70000,]
validation <- data[70000:90000,]
test <- data[90000:,]
or does there exist a method to split my data in a more correct and efficient way in R? Thank you!
Aucun commentaire:
Enregistrer un commentaire