lundi 24 août 2015

Good results when training and cross-validating a model, but test data set shows poor results

My problem is that I obtain a model with very good results (training and cross-validating), but when I test it again (with a different data set) poor results appear.

I got a model which has been trained and cross-validating tested. The model shows AUC=0.933, TPR=0.90 and FPR=0.04

I guess there is no overfitting present looking at pictures, corresponding to learning curve (error), learning curve (score), and deviance curve:

learning curve with score learning curve with error deviance curve

The problem is that when I test this model with a different test data set, I obtain poor results, nothing to do with my previus results AUC=0.52, TPR=0.165 and FPR=0.105

I used Gradient Boosting Classifier to train my model, with learning_rate=0.01, max_depth=12, max_features='auto', min_samples_leaf=3, n_estimators=750

I used SMOTE to balance the class. It is binary model. I vectorized my categorical attributes. I used 75% of my data set to train and 25% tot test. My model has a very low training error, and a low test error, so I guess it is not overfitted. Training error is very low, so there are not outliers in the training and cv-test data sets. What can I do from now on to find the problem? Thanks

Aucun commentaire:

Enregistrer un commentaire