I am attempting to decrease the values of the value
column by 0.000001 for observations that are not in the first row into a new column called lagged.values
. I then want to fill the NAs resulting from the lag computation with the original values for the first rows.
Example Data:
test =
tibble(
problems = c("money", "money", "money", "food", "food", "bills", "bills"),
category_problems = c("financial insecurity", "financial insecurity", "financial insecurity", "cost of living", "cost of living", "financial insecurity", "financial insecurity"),
value = c(3, 3, 3, 2, 2, 1, 1)
)
Creation of Function:
lag.values = function(x) {
if_else(row_number(x) != 1,
lag(x) - 0.000001,
x)
}
Attempt:
test |>
mutate(lagged.values = value) |>
group_by(value) |>
group_map(~lag.values(.x$lagged.values))
Output:
Desired Output:
Up Front: I should note that assuming perfect equality for grouping by value
will be subject to floating-point issues as discussed in Why are these numbers not equal?, Is floating-point math broken?, https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f. While two numbers may look the same on the console, and/or they should be equal mathematically, the way floating-point numbers are stored in digital computers lends to small snafus periodically. This affects base R, dplyr
, data.table
, python, julia, ... anything that does standard number storage. Large-precision libraries exist that are much better at this, though they are less common.
No need for a function,
library(dplyr)
test |>
mutate(.by = value, value = value - 0.000001 * (row_number() - 1)) |>
as.data.frame()
# problems category_problems value
# 1 money financial insecurity 3.000000
# 2 money financial insecurity 2.999999
# 3 money financial insecurity 2.999998
# 4 food cost of living 2.000000
# 5 food cost of living 1.999999
# 6 bills financial insecurity 1.000000
# 7 bills financial insecurity 0.999999
The |> as.data.frame()
is merely to circumvent tibble's tendency to hide some precision, it is not required for anything else.
You asked using dplyr
and purrr
, but for alternatives:
### base R
ave(test$value, test$value, FUN = \(z) z - 0.000001 * (seq_along(z)-1))
# [1] 3.000000 2.999999 2.999998 2.000000 1.999999 1.000000 0.999999
### assign back into `test$value`
### data.table
library(data.table)
as.data.table(test)[, value := value - 0.000001 * (seq_len(.N) - 1), value][]
# problems category_problems value
# <char> <char> <num>
# 1: money financial insecurity 3.000000
# 2: money financial insecurity 2.999999
# 3: money financial insecurity 2.999998
# 4: food cost of living 2.000000
# 5: food cost of living 1.999999
# 6: bills financial insecurity 1.000000
# 7: bills financial insecurity 0.999999
Data
test <- structure(list(problems = c("money", "money", "money", "food", "food", "bills", "bills"), category_problems = c("financial insecurity", "financial insecurity", "financial insecurity", "cost of living", "cost of living", "financial insecurity", "financial insecurity"), value = c(3, 3, 3, 2, 2, 1, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L))