I am a beginner in R. I used cut function in R to bin my data. My data starts from 0 but after cutting the lower boundary has a negative result and I have no idea why this happened.
My code is:
cancer_rtcl$cancer_rate_cut=cut(cancer_rtcl$rate,6)
The statistical summary of my data is:
> summary(cancer_rtcl$rate)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 13.3 16.5 16.4 18.8 63.5
> dput(cancer_rtcl$rate)
c(63.5, 41.5, 36, 33.9, 29.7, 27.2, 27.2, 26, 25.9, 25.9, 25.3,
25.1, 24.6, 24.3, 23.6, 23.3, 22.8, 22.7, 22.5, 22.4, 22.3, 22.3,
21.9, 21.9, 21.7, 21.6, 21.5, 21.4, 21.3, 21.2, 21.2, 20.9, 20.8,
20.7, 20.5, 20.5, 20.3, 20.2, 20, 19.7, 19.7, 19.6, 19.6, 19.5,
19.4, 19.1, 19, 19, 19, 18.9, 18.9, 18.8, 18.8, 18.8, 18.8, 18.8,
18.7, 18.5, 18.5, 18.5, 18.4, 18.3, 18.3, 18.2, 18.2, 18.2, 18.1,
18.1, 18, 17.9, 17.9, 17.9, 17.8, 17.8, 17.8, 17.7, 17.7, 17.6,
17.6, 17.6, 17.5, 17.4, 17.4, 17.3, 17.3, 17.3, 17.3, 17.3, 17.2,
17.2, 17.1, 17.1, 17.1, 17, 17, 16.9, 16.9, 16.9, 16.8, 16.8,
16.7, 16.6, 16.6, 16.6, 16.5, 16.5, 16.5, 16.5, 16.5, 16.4, 16.4,
16.4, 16.4, 16.2, 16.1, 16, 16, 16, 16, 15.9, 15.9, 15.8, 15.8,
15.7, 15.7, 15.7, 15.7, 15.6, 15.6, 15.6, 15.6, 15.6, 15.5, 15.4,
15.4, 15.4, 15.3, 15.3, 15.3, 15.3, 15.2, 15.1, 15.1, 15, 15,
14.8, 14.6, 14.6, 14.4, 14.2, 14.2, 14.1, 14.1, 14.1, 14.1, 14,
13.9, 13.8, 13.7, 13.6, 13.6, 13.6, 13.3, 13.2, 13.2, 13.1, 13.1,
13, 12.9, 12.9, 12.7, 12.6, 12.5, 12.4, 12.3, 12.3, 12.2, 12,
11.9, 11.8, 11.6, 11.6, 11.4, 11.4, 11.3, 11, 10.8, 10.8, 10.7,
10.6, 10.5, 10.2, 9.9, 9.8, 9.7, 9.7, 9.6, 9.6, 9.5, 9.3, 9.2,
9.2, 9, 9, 8, 7.9, 7.3, 7.1, 7, 6.9, 6.3, 4.6, 0, 0, 0, 0, 0)
But the cutting result is:
6 Levels: (-0.0635,10.6] (10.6,21.2] (21.2,31.8] (31.8,42.3] ... (52.9,63.6]
As you can see, the lowest boundary is a negative result, which is not ideal because I need to make a map based on the binned data.
I also tried another type of coding:
cancer_rtcl$rate_cut=cut(cancer_rtcl$rate,c(5,10,15,20,25))
But in this way, I lost the data larger than 25.
Can anyone help to figure out how to bin the data and get the exact lowest and highest boundaries? Thanks!
Does this work to capture data larger than 25? cancer_rtcl$rate_cut1=cut(cancer_rtcl$rate,c(5,10,15,20,25,Inf))