jeudi 25 avril 2019

When to use Train Validation Test sets

I know this question is quite common but I have looked at all the questions that have been asked before and I still can't understand why we also need a validation set. I know sometimes people only use a train set and a test set, so why do we also need a validation set? And how do we use it? For example, in order to impute missing data, I impute these 3 different sets separately or not? Also, is it okay to just split my data with indexing (for example, in a dataset with 100000 rows, I just do this:

train <- data[1:70000,]

validation <- data[70000:90000,]

test <- data[90000:,]

or does there exist a method to split my data in a more correct and efficient way in R? Thank you!

Aucun commentaire:

Enregistrer un commentaire