dimanche 15 septembre 2019

testing existing model with different population?

hi guys i have a machine learning problem, this is about classification

so i have 2 datasets let's called it data1 and data2,

data1 is about historical data(data 1 have 50000 row with 30 columns) data2 is a new data(data 2 have 150 row with 10 columns(this 10 columns also exist in data 1)

what i want to do is:

i want to use data1 machine learning model to TEST IT into data2 to do a classification problem

=====================================================================

what i have done:

1.because of the difference i only took 10 columns of data 2 for data1 to train the model, so for data1 before training i have 50000 row with 10 columns

  1. i modeled data1(based on that 10 feature ) and get a very good score(ROC/AUC, accuracy, precission is above 90%)

  2. i tried split test training, Crossvalidate for data1 ONLY and the result is very good

BUT..the problem is

  1. when i tried to test the model of data1 to data2 ...the result is become very bad

so im asking about what happened here? am i getting bad result because of data1 and data2 is a different population?

can i have a good theory to match my assumption or anything if im mistaken. thank you

Aucun commentaire:

Enregistrer un commentaire