> test
# A tibble: 30 × 2
# Groups: Week [30]
Week Dates
<dbl> <chr>
1 2 2023-10-04, 2023-10-05, 2023-10-05, 2023-10-06, 2023-10-06, 2023-10-06, 2023-10-08, 2023-10-08
2 3 2023-10-11, 2023-10-12, 2023-10-12, 2023-10-14, 2023-10-15
3 4 2023-10-18, 2023-10-19, 2023-10-20, 2023-10-20, 2023-10-21, 2023-10-21, 2023-10-22, 2023-10-22
4 5 2023-10-25, 2023-10-25, 2023-10-26, 2023-10-27, 2023-10-28, 2023-10-29, 2023-10-29, 2023-10-30
5 6 2023-11-01, 2023-11-01, 2023-11-01, 2023-11-01, 2023-11-02, 2023-11-02, 2023-11-03, 2023-11-04, 2023-11-05, 2023-11-05
6 7 2023-11-09, 2023-11-10, 2023-11-13
7 8 2023-11-16, 2023-11-17, 2023-11-18, 2023-11-19, 2023-11-21
8 9 2023-11-22, 2023-11-22, 2023-11-23
9 10 2023-11-29, 2023-11-30, 2023-12-02, 2023-12-03, 2023-12-04
10 11 2023-12-06, 2023-12-07, 2023-12-08, 2023-12-08, 2023-12-09, 2023-12-10, 2023-12-10
# ℹ 20 more rows
Dated are pasted with comma then it's saved as characters in data set of 'test'
I need to count the unique date of each week.
For example, the number of counted dates for week2 should be 4: 2023-10-04,2023-10-05,2023-10-06, 2023-10-08 and the number of counted dates for week3 should be 4: 2023-10-11,2023-10-12,2023-10-14, 2023-10-15 so and so forth.
but I tried with
> with(test, tapply(Dates, Week, function(x) nlevels(unique(as.factor(x)))))
2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> with(test, sapply(Dates, function(x) nlevels(unique(as.factor(x)))))
2023-10-04, 2023-10-05, 2023-10-05, 2023-10-06, 2023-10-06, 2023-10-06, 2023-10-08, 2023-10-08
1
2023-10-11, 2023-10-12, 2023-10-12, 2023-10-14, 2023-10-15
1
2023-10-18, 2023-10-19, 2023-10-20, 2023-10-20, 2023-10-21, 2023-10-21, 2023-10-22, 2023-10-22
1
2023-10-25, 2023-10-25, 2023-10-26, 2023-10-27, 2023-10-28, 2023-10-29, 2023-10-29, 2023-10-30
1
2023-11-01, 2023-11-01, 2023-11-01, 2023-11-01, 2023-11-02, 2023-11-02, 2023-11-03, 2023-11-04, 2023-11-05, 2023-11-05
1
2023-11-09, 2023-11-10, 2023-11-13
1
> n_distinct(unique(as.factor(test$Dates[1])))
[1] 1
it all recognize as one chunk.
> unique(factor(str_split(test$Dates[1], ',')))
[1] c("2023-10-04", " 2023-10-05", " 2023-10-05", " 2023-10-06", " 2023-10-06", " 2023-10-06", " 2023-10-08", " 2023-10-08")
Levels: c("2023-10-04", " 2023-10-05", " 2023-10-05", " 2023-10-06", " 2023-10-06", " 2023-10-06", " 2023-10-08", " 2023-10-08")
> unique(str_split(test$Dates[1], ','))
[[1]]
[1] "2023-10-04" " 2023-10-05" " 2023-10-05" " 2023-10-06" " 2023-10-06" " 2023-10-06" " 2023-10-08" " 2023-10-08"
> nlevels(factor(str_split(test$Dates[1], ',')))
[1] 1
nor string split can't recognize as distinct(unique) counts
Example data:
x <- c(
"2023-10-04, 2023-10-05, 2023-10-05, 2023-10-06, 2023-10-06, 2023-10-06, 2023-10-08, 2023-10-08",
"2023-10-11, 2023-10-12, 2023-10-12, 2023-10-14, 2023-10-15"
)
Count e.g. like this:
x |> strsplit(', ') |> sapply(\(x) length(unique(x)))
Or using tidyverse
:
x |> str_split(', ') |> map_int(n_distinct)
Both give
[1] 4 4