I am making a predictive model and I have a train and test data set. Essentially the train data set has all the "Termination_Date"s where they are not null/na. The test has all the data that has NULL/na in the termination date. When I run my predictive model it says there are new levels in the columns I am using. I would like to remove the columns where there are new levels.
AllData4 <- AllData3[!is.na(AllData3$Termination_Date), ]
Train <- AllData4 %>%
mutate(Work_Country_Code = as.factor(Work_Country_Code),
Person_Type = as.factor(Person_Type),
Job_Family = as.factor(Job_Family),
Department = as.factor(Department),
Assignment_Person_Type = as.factor(Assignment_Person_Type),
Transfer = as.factor(Transfer))
model1 <- lm(Termination_Date ~ Date_Of_Birth + Work_Country_Code + Person_Type + Contractual_Date_Of_Joining + Date_Joined_Fujitsu + Job_Family + Department + Assignment_Person_Type + Transfer, data = train)
test1 <- AllData3[is.na(AllData3$Termination_Date),]
pred <- predict(model1, newdata = test2)
Below are some things that I have tried that are not working:
test2$Work_Country_Code <- droplevels(test1$Work_Country_Code, except = is.na(AllData3$Termination_Date))
test2$Work_Country_Code <- factor(train$Work_Country_Code)
Aucun commentaire:
Enregistrer un commentaire