How do I write a function that takes a data set (i am using iris) and creates a test and data set with each category (label) represented proportionally?
def train_test_split(data, labels, n, test_proportion)
where n is the number of categories
The output test_data should contain test_proportion % of the data, with test_labels containing the correct labels corresponding to the data in test_data; train_data should then contain the remaining data, with train_labels containing the labels for the feature vectors in train_data.
Both test_data and train_data should have equal proportions of each of the n categories. For example, for n = 3, then both test_data and train_data contain 1/3 observations of category 0, 1/3 of category 1 and 1/3 of category 2, even though test_data and train_data may contain different numbers of entries (when test_proportion is different from 0.5).
Aucun commentaire:
Enregistrer un commentaire