I'm currently working on the civil conflict, and my dataset is the UCDP Armed Conflict Dataset. My focus is on the monthly duration of the civil war. However, I'm having trouble converting the original conflict-year data into conflict-month data.
I'll provide an example of my data below:
conflict_id | start_date | end_date | year | termination |
---|---|---|---|---|
100 | 1946-05-18 | NA | 1946 | 0 |
100 | 1946-05-18 | 1947-03-01 | 1947 | 1 |
101 | 1950-05-01 | 1950-07-01 | 1947 | 1 |
I am expecting following result :
conflict_id | year | month | duration | termination |
---|---|---|---|---|
100 | 1946 | 5 | 1 | 0 |
100 | 1946 | 6 | 2 | 0 |
100 | 1946 | 7 | 3 | 0 |
... | ... | ... | ... | |
100 | 1947 | 2 | 9 | 0 |
100 | 1947 | 3 | 10 | 1 |
Any suggestions, examples would be greatly appreciated. Thank you in advance for your time and expertise!
one approach (rather long 'tidy-style' pipeline so you might want to break it up to inspect which does which):
library(tidyr)
library(lubridate)
library(zoo)
df |> ## df is a dataframe of the example data you provided
mutate(across(ends_with('_date'),
~ as.Date(.x) |> as.yearmon()
)
) |>
group_by(conflict_id) |>
summarize(start = min(start_date, na.rm = TRUE),
end = max(end_date, na.rm = TRUE)
) |>
rowwise() |>
mutate(ym = seq(start, end, 1/12) |> list()) |>
unnest_longer(ym) |>
select(conflict_id, ym) |>
group_by(conflict_id) |>
mutate(year = year(ym),
month = month(ym),
duration = row_number(),
termination = ifelse(duration < max(duration), 0, 1)
)
+ # A tibble: 14 x 6
# Groups: conflict_id [2]
conflict_id ym year month duration termination
<int> <yearmon> <dbl> <dbl> <int> <dbl>
1 100 Mai 1946 1946 5 1 0
2 100 Jun 1946 1946 6 2 0
## ... lines removed
11 100 Mär 1947 1947 3 11 1
12 101 Mai 1950 1950 5 1 0
13 101 Jun 1950 1950 6 2 0
14 101 Jul 1950 1950 7 3 1
>