I have two data frames which the following code will provide an example of the data that I have:
df <- data.frame("ID" = c(325, 464, 464, 464, 512, 512, 687, 701, 869),
"DATE" = c("2012-11-05", "2014-04-04", "2014-04-05", "2014-04-06",
"2014-10-13", "2014-10-14", "2014-12-12", "2015-02-17", "2015-06-25"))
df2 <- data.frame("ID" = c(325, 464, 512, 701, 869, 922, 954, 989),
"DATE_1" = c("2012-11-03", "2014-04-01", "2014-10-10", "2015-02-14",
"2015-06-20", "2015-07-07", "2015-09-11", "2015-11-23"),
"DATE_2" = c("2012-11-08", "2014-04-10", "2014-10-19", "2015-02-22",
"2015-06-29", "2015-07-13", "2015-09-25", "2015-11-29"),
"INTERVAL" = c("2012-11-03--2012-08-11 UTC", "2014-04-01--2014-04-10 UTC",
"2014-10-10--2014-10-19 UTC", "2015-02-14--2014-02-22 UTC",
"2015-06-20--2015-06-29 UTC", "2015-07-07--2015-07-13 UTC",
"2015-09-11--2015-09-25 UTC", "2015-11-23--2015-11-29 UTC"),
"KEY" = c(6, 8, 233, 642, 1233, 2464, 3436, 4366))
Which should yield:
df
ID DATE
1 325 2012-11-05
2 464 2014-04-04
3 464 2014-04-05
4 464 2014-04-06
5 512 2014-10-13
6 512 2014-10-14
7 687 2014-12-12
8 701 2015-02-17
9 869 2015-06-25
df2
ID DATE_1 DATE_2 INTERVAL KEY
1 325 2012-11-03 2012-11-08 2012-11-03--2012-08-11 UTC 6
2 464 2014-04-01 2014-04-10 2014-04-01--2014-04-10 UTC 8
3 512 2014-10-10 2014-10-19 2014-10-10--2014-10-19 UTC 233
4 701 2015-02-14 2015-02-22 2015-02-14--2014-02-22 UTC 642
5 869 2015-06-20 2015-06-29 2015-06-20--2015-06-29 UTC 1233
6 922 2015-07-07 2015-07-13 2015-07-07--2015-07-13 UTC 2464
7 954 2015-09-11 2015-09-25 2015-09-11--2015-09-25 UTC 3436
8 989 2015-11-23 2015-11-29 2015-11-23--2015-11-29 UTC 4366
The interval column was created with lubridate's interval function with "DATE_1" and "DATE_2" as arguments embedded in dplyr's mutate function.
My problem is that I need to test the rows in my primary data.frame (df) if the ID matches the ID in my secondary data.frame (df2) and if the date in df falls within the interval in df2.
I would like R to create a new column in df called "NEW_KEY" and if a row in df tests true under both conditions then R should take the value in the KEY column in df2 from the row in which fit on ID and interval and insert it into the NEW_KEY column in df. If the row tests false under both conditions then R should insert NA in the row under the NEW_KEY column.
Do mind that df contains multiple duplicates which all should tests true under both conditions and should have the same KEY value filled in each row and df also contains ID which has no corresponding row in df2 and should have its NEW_KEY filled with NA.
I have attempted the following two simplistic code
df$NEW_KEY <- ifelse(ifelse(df$ID %in% df2$ID & df$DATE %within% df2$INTERVAL, df2$KEY, NA)
and
df <- mutate(df, NEW_VALUE = ifelse(df$ID %in% df2$ID & df$DATE %within% df2$INTERVAL, df2$KEY, NA))
both which yield the same error message:
1: I as.numeric(a) - as.numeric(int@start) : longer object length is not a multiple of shorter object length
2: I as.numeric(a) - as.numeric(int@start) <= int@.Data : longer object length is not a multiple of shorter object length
3: I as.numeric(a) - as.numeric(int@start) : longer object length is not a multiple of shorter object length
and only seems to fill out some rows in df and NA the rest.
So I'm currently at a halt on what to do. Any help would be greatly appreciated.
Aucun commentaire:
Enregistrer un commentaire