Good afternoon!
Currently I'm working with a large database that contains various information on how loans are paid by borrowers
The example is as follows:
Loan_ID <- c(1, 1, 1, 2, 2, 2)
Execution_Sum <- c(1000, 900, 800, 500, 600, 400)
Execution_Date <- c("2022-03-10", "2022-03-15", "2022-03-17", "2022-03-11", "2022-03-22", "2022-03-29")
df <- data.frame(Loan_ID, Execution_Sum, Execution_Date)
df$Execution_Date <- as.Date(df$Execution_Date)
I've got 3 columns: unique loan ID, Execution_Sum (what was paid by the borrower) and Execution_Date (when this payment actually happend).
The problem is that I want to create a subset of my data frame to get only 1 row for each unique loan (for each unique ID) in such a way that this row does not reflect all payments made by the borrower, but only 1 closest to a predetermined date (March 31, 2022 in my case)
So, the desired result should be:
1, 800, "2022-03-17" and 2, 400, "2022-03-29"
Because these two dates are the closest to March 31, 2022
Is it possible to create this kind of subset?
Thanks a lot for any help provided!
We may slice
on the row (which.min
) which gives the abs
olute min
imum difference between the 'Execution_Date' and the predetermined date after grouping by 'Loan_ID'
library(dplyr)
df %>%
group_by(Loan_ID) %>%
slice(which.min(abs(Execution_Date - as.Date("2022-03-31")))) %>%
ungroup
-output
# A tibble: 2 × 3
Loan_ID Execution_Sum Execution_Date
<dbl> <dbl> <date>
1 1 800 2022-03-17
2 2 400 2022-03-29