vendredi 8 mai 2015

How can I test and train multiple data sets in the form of two lists?

I would like to create a function to train and test 10 separate data sets, in two lists. Here are the lists:

blend_30_d<-list(desktop_30_1, desktop_30_2, desktop_30_3, desktop_30_4, desktop_30_5, desktop_30_6, desktop_30_7, desktop_30_8, desktop_30_9, desktop_30_10)

blend_30_td<-list(desktop_30_t1, desktop_30_t2, desktop_30_t3, desktop_30_t4, desktop_30_t5, desktop_30_t6, desktop_30_t7, desktop_30_t8, desktop_30_t9, desktop_30_t10)

The names of each individual dataset are:

[1] "date" "Wkday" "Imps" "Clicks" "Total_Cost" "Units"
[7] "January" "February" "March" "April" "May" "June"
[13] "July" "August" "September" "October" "November" "December"
[19] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday"
[25] "Sunday" "Vday" "Tgiving" "Xmas" "XmasE" "NYE"
[31] "NYD" "July4" "Labor" "Memorial" "Mob_App_Launch" "Auto_Approve_Launch"

I've built the following function- I want blend_30_d[1] to get tested against blend_30_td[1].

d_cost <- function(train, test){
    ####Run regression on training
    q<-lm(Total_Cost ~ . -date - Wkday - Imps - Clicks + poly(date, 2), data=train)
    ####Predict values into test set
    test_cost_d <- predict.lm(q, x=test)
    ####Calculate R^2 between predicted vs. actual values
    z<-(cor(test_cost_d, test$Total_Cost))^2
}

d_cost(blend_30_d, blend_30_td)

I'm receiving the following error: Error in terms.formula(formula, data = data) : duplicated name 'date' in data frame using '.'

I'm not sure that this is the correct approach with two lists...any suggestions? Thanks!

Aucun commentaire:

Enregistrer un commentaire