rdatedata.table

Row numbers of lowest dates for each year in a datatable column


I have a datatable with a column that contains dates:

> dat
          Var1  Var2
        <Date> <num>
 1: 2023-11-01 18.05
 2: 2023-12-01  4.65
 3: 2024-01-01 20.34
 4: 2024-02-01 21.71
 5: 2024-03-01 51.60
 6: 2024-04-01 55.54
 7: 2024-05-01 50.34
 8: 2024-06-01 45.90
 9: 2024-07-01 21.26
10: 2024-08-01 19.52

> dput(dat)
structure(list(Var1 = structure(c(19662, 19692, 19723, 19754, 
19783, 19814, 19844, 19875, 19905, 19936), class = "Date"), Var2 = c(18.05, 
4.65, 20.34, 21.71, 51.6, 55.54, 50.34, 45.9, 21.26, 19.52)), row.names = c(NA, 
-10L), class = c("data.table", "data.frame"))

I want a vector of the row numbers of those rows that contain the lowest date for each year, in this case:

> first.dates
[1] 1 3

How can I find these?


Solution

  • Probably this could help

    > which(dat[, rank(Var1), year(Var1)]$V1 == 1)
    [1] 1 3
    

    or

    > setorder(dat[, cbind(rid = .I, .SD)], "Var1")[, first(rid), year(Var1)]$V1
    [1] 1 3