jeudi 21 mai 2020

I need to drop levels from one table but referring to a different table in R?

I am making a predictive model and I have a train and test data set. Essentially the train data set has all the "Termination_Date"s where they are not null/na. The test has all the data that has NULL/na in the termination date. When I run my predictive model it says there are new levels in the columns I am using. I would like to remove the columns where there are new levels.

AllData4 <- AllData3[!is.na(AllData3$Termination_Date), ]

Train <- AllData4 %>%
  mutate(Work_Country_Code = as.factor(Work_Country_Code),
         Person_Type = as.factor(Person_Type),
         Job_Family = as.factor(Job_Family),
         Department = as.factor(Department),
         Assignment_Person_Type = as.factor(Assignment_Person_Type),
         Transfer = as.factor(Transfer))
model1 <- lm(Termination_Date ~ Date_Of_Birth + Work_Country_Code + Person_Type + Contractual_Date_Of_Joining + Date_Joined_Fujitsu + Job_Family + Department + Assignment_Person_Type + Transfer, data = train)
test1 <- AllData3[is.na(AllData3$Termination_Date),]
pred <- predict(model1, newdata = test2)

Below are some things that I have tried that are not working:

test2$Work_Country_Code <- droplevels(test1$Work_Country_Code, except = is.na(AllData3$Termination_Date))
test2$Work_Country_Code <- factor(train$Work_Country_Code)

Aucun commentaire:

Enregistrer un commentaire