lundi 23 novembre 2020

When picking a train test split for model training, does it choose an even number of samples from all class?

Let's say I have a CNN model to classify the handwritten numbers 1 to 10. I am using a dataset with 20,000 samples and I make a train test split of 50:50.

That leaves me with 10,000 for training and testing. Will, it automatically pick 1000 images from each class for testing/training, or will it approximate it?

I am trying a similar problem, (with different numbers of samples and classes) but I noticed that the testing data is not evenly split. For example, it has 1010 number ones being tested but only 990 number twos.

Is this normal? I couldn't find any documentation verifying this. My dataset is large enough that the small discrepancy is irrelevant, but I still would like to confirm.

Thanks!

Aucun commentaire:

Enregistrer un commentaire