iteratively decrease values in observations for a grouped dataset without changing observations in the first rows using group_map and return a tibble

I am attempting to decrease the values of the value column by 0.000001 for observations that are not in the first row into a new column called lagged.values. I then want to fill the NAs resulting from the lag computation with the original values for the first rows.

Example Data:

test = 
  tibble(
    problems = c("money", "money", "money", "food", "food", "bills", "bills"),
    category_problems = c("financial insecurity", "financial insecurity", "financial insecurity", "cost of living", "cost of living", "financial insecurity", "financial insecurity"),
    value = c(3, 3, 3, 2, 2, 1, 1)
  )

Creation of Function:

lag.values = function(x) {
  if_else(row_number(x) != 1,
  lag(x) - 0.000001,
  x)
}

Attempt:

test |>
  mutate(lagged.values = value) |>
  group_by(value) |>
  group_map(~lag.values(.x$lagged.values))

Output:

Desired Output:

Solution

Up Front: I should note that assuming perfect equality for grouping by value will be subject to floating-point issues as discussed in Why are these numbers not equal?, Is floating-point math broken?, https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f. While two numbers may look the same on the console, and/or they should be equal mathematically, the way floating-point numbers are stored in digital computers lends to small snafus periodically. This affects base R, dplyr, data.table, python, julia, ... anything that does standard number storage. Large-precision libraries exist that are much better at this, though they are less common.

No need for a function,

library(dplyr)
test |>
  mutate(.by = value, value = value - 0.000001 * (row_number() - 1)) |>
  as.data.frame()
#   problems    category_problems    value
# 1    money financial insecurity 3.000000
# 2    money financial insecurity 2.999999
# 3    money financial insecurity 2.999998
# 4     food       cost of living 2.000000
# 5     food       cost of living 1.999999
# 6    bills financial insecurity 1.000000
# 7    bills financial insecurity 0.999999

The |> as.data.frame() is merely to circumvent tibble's tendency to hide some precision, it is not required for anything else.

You asked using dplyr and purrr, but for alternatives:

### base R
ave(test$value, test$value, FUN = \(z) z - 0.000001 * (seq_along(z)-1))
# [1] 3.000000 2.999999 2.999998 2.000000 1.999999 1.000000 0.999999
### assign back into `test$value`

### data.table
library(data.table)
as.data.table(test)[, value := value - 0.000001 * (seq_len(.N) - 1), value][]
#    problems    category_problems    value
#      <char>               <char>    <num>
# 1:    money financial insecurity 3.000000
# 2:    money financial insecurity 2.999999
# 3:    money financial insecurity 2.999998
# 4:     food       cost of living 2.000000
# 5:     food       cost of living 1.999999
# 6:    bills financial insecurity 1.000000
# 7:    bills financial insecurity 0.999999

Data

test <- structure(list(problems = c("money", "money", "money", "food", "food", "bills", "bills"), category_problems = c("financial insecurity", "financial insecurity", "financial insecurity", "cost of living", "cost of living", "financial insecurity", "financial insecurity"), value = c(3, 3, 3, 2, 2, 1, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L))