jeudi 26 janvier 2017

How to adapt datasplit sizes with createDataPartition()

I have a question concerning datasplitting into train, test & validation with createDataPartition(). I found a solution that fits perfectly for a 60, 20, 20 split. However, I don't see a way to adapt my data splitting with it and still ensure that my data is not overlapping. I.e., I would like to split into 80, 10, 10 or whatever.

# Draw a random, stratified sample including p percent of the data    
idx.train <- createDataPartition(y = iris$Species, p = 0.8, list = FALSE) 
# training set with p = 0.8
train <- iris[idx.train, ] 
# test set with p = 0.2 (drop all observations with train indeces)
test <-  iris[-idx.train, ] 
# Draw a random, stratified sample of ratio p of the data
idx.validation <- createDataPartition(y = train$Species, p = 0.25, list = FALSE) 
#validation set with p = 0.8*0.25 = 0.2
validation <- train[idx.validation, ] 
#final train set with p= 0.8*0.75 = 0.6
train60 <- train[-idx.validation, ] 

Aucun commentaire:

Enregistrer un commentaire