mercredi 17 janvier 2018

Training-testing data set for a learning machine using R

I have been using a learning machine in order to forecast a variable from time-series data. My question comes when I create both data sets with the following script:

>library(caret)
>ind=createDataPartition(Data$variable, p=2/3, list = FALSE)    
>train<-Data[ind,]    
>test<-Data[-ind,]

These data sets are randomly chosen from the whole data set, having 2/3 for training and 1/3 for testing.

Do you consider that this technique is it correct? As my point of view, the predicted data will have a high r^2 because it is a time-series dataset (highly correlated). Do you consider that it would be more beneficial picking the last 1/3 of the data (ordinal technique).

Thanks,
Regards.

Aucun commentaire:

Enregistrer un commentaire