vendredi 6 novembre 2020

non overlapping data in train test validation split python

I'm serching to create a function for some deep learning issues for satelite images classification. I have serched throw a lot of libraries and I havent find my needs I tried thissikit-learn but I fel that it is not what I need

I want to split the data to test train vlidation with no overlapping groups ( unique data in each group and no redundant ones)

Anyone can help me code this function with python

my data are arrays ( extracted from satelite data ) with: id: the array of labeled pixels with labels and ids / label the array contains the labels (classes) and images the array of the converted images

def train_valid_test_split (id_array, label_array, bands_array):
  - get the list of classes
  - loop through classes, retrieve unique IDs by class
  - randomly shuffle the IDs
  - split the list of IDs (assign 50% in a train list, 20% in a validation list, and 30% in a test list)
  - at the end of the loop, recover the indice_train, indice_valid and indice_test by doing np.where (np.isin (id_array, lstIDTrain / Valid / Test))
 -return (return) the data (train / valid / test) of the images and the labels on the bands_array and label_array arrays
     return train_bands, valid_bands, test_bands, train_label, valid_label, test_label

ELSE, any hint for a specilised function that I may not seen

Aucun commentaire:

Enregistrer un commentaire