I am working on classification problem and I have highly imbalanced, but huge data set (I have more than 2mio samples). Now my question is: If I choose subsample of only 15% of the data for training, which is chosen to be balanced, is it OK to use the rest of the samples for testing, even if it is highly imbalanced? Of course I can not look at the accuracy, but some other measure which is taking imbalances into account, e.g. balanced accuracy.
Aucun commentaire:
Enregistrer un commentaire