hi guys i have a machine learning problem, this is about classification
so i have 2 datasets let's called it data1 and data2,
data1 is about historical data(data 1 have 50000 row with 30 columns) data2 is a new data(data 2 have 150 row with 10 columns(this 10 columns also exist in data 1)
what i want to do is:
i want to use data1 machine learning model to TEST IT into data2 to do a classification problem
=====================================================================
what i have done:
1.because of the difference i only took 10 columns of data 2 for data1 to train the model, so for data1 before training i have 50000 row with 10 columns
-
i modeled data1(based on that 10 feature ) and get a very good score(ROC/AUC, accuracy, precission is above 90%)
-
i tried split test training, Crossvalidate for data1 ONLY and the result is very good
BUT..the problem is
- when i tried to test the model of data1 to data2 ...the result is become very bad
so im asking about what happened here? am i getting bad result because of data1 and data2 is a different population?
can i have a good theory to match my assumption or anything if im mistaken. thank you
Aucun commentaire:
Enregistrer un commentaire