I have a question concerning datasplitting into train, test & validation with createDataPartition(). I found a solution that fits perfectly for a 60, 20, 20 split. However, I don't see a way to adapt my data splitting with it and still ensure that my data is not overlapping. I.e., I would like to split into 80, 10, 10 or whatever.
# Draw a random, stratified sample including p percent of the data
idx.train <- createDataPartition(y = iris$Species, p = 0.8, list = FALSE)
# training set with p = 0.8
train <- iris[idx.train, ]
# test set with p = 0.2 (drop all observations with train indeces)
test <- iris[-idx.train, ]
# Draw a random, stratified sample of ratio p of the data
idx.validation <- createDataPartition(y = train$Species, p = 0.25, list = FALSE)
#validation set with p = 0.8*0.25 = 0.2
validation <- train[idx.validation, ]
#final train set with p= 0.8*0.75 = 0.6
train60 <- train[-idx.validation, ]
Aucun commentaire:
Enregistrer un commentaire