My data is about 40 animals (ids) with locations by telemetry, and I have already stipulated 3 areas. The first one is AR
, where is breeding area, AM
migration, and AA
is feeding area. The first locations of all animals is at AR
. But sometimes the animal is in breeding period yet (at AR
), but can go out to AM
few times, and then came back to AR
. Only when the animal have only AM
they started the migration, until arrive at feeding area AA
. So, they start at AR
, then start the migration AM
and then arrive at the feeding area AA
.
Im trying to create a new column with some conditions that I dont know yet how to do, I have this data frame for example
id area
2304 AR
2304 AR
2304 AR
2304 AM #this AM for example, can repeat until 20 times and then came back to AR
2304 AM
2304 AR
2304 AR
2304 AR
2304 AM
2304 AM
2304 AM
2304 AM
2304 ...
2304 AM
2304 AM
2304 AM
2304 AA
2304 AA
2304 ...
2304 AA
So, when have AR x times and after this have one or until 20 AM and came back have AR, I want a new column with AR. By the moment when have AM x times and only AM, without come back to AR, I want new column with AM. Like this:
And AA its ok, AA = AA always
I expected this:
id area fixed_area
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AR #this AM for example, can repeat until 20 times and then came back to AR
2304 AM AR
2304 AR AR
2304 AR AR
2304 AR AR
2304 AM AM
2304 AM AM
2304 AM AM
2304 AM AM
2304 ... ...
2304 AM AM
2304 AM AM
2304 AM AM
2304 AA AA
2304 AA AA
2304 ... ...
2304 AA AA
I tryed this:
but the AA
is missing, maybe the problem is because need do this separation per animal (id)
> table(df$area)
AA AM AR
31460 39101 28820
class(df$area)
[1] "character"
> idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))
> df$fixed_area <- with(df, replace(area, idx < max(idx[area == 'AM']), 'AR'))
> table(df$fixed_area)
AM AR
145 99236
>
After this I dput the data frame, but my data frame have more than 90.000 rows, so I copied just head values
> dput(head(df))
structure(list(DeployID = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), Start = structure(c(1477323868,
1477323946, 1477324002, 1477324044, 1477324260, 1477324480), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), End = structure(c(1477323944, 1477324000,
1477324042, 1477324170, 1477324458, 1477324542), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), What = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Dive", "Message", "Surface"), class = "factor"),
Shape = structure(c(2L, 4L, 3L, 2L, 2L, 2L), .Label = c("",
"Square", "U", "V"), class = "factor"), DepthMean = c(14.5,
16.5, 13, 14.5, 11, 12.5), DurationMean = c(76, 54, 40, 126,
198, 62), DepthMin = c(14.5, 16.5, 13, 14.5, 11, 12.5), DepthMax = c(14.5,
16.5, 13, 14.5, 11, 12.5), depth_range = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("shallow", "deep"), class = c("ordered",
"factor")), MidTime = structure(c(1477323906, 1477323973,
1477324022, 1477324107, 1477324359, 1477324511), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), year = c(2016, 2016, 2016, 2016,
2016, 2016), id = c("111868_16", "111868_16", "111868_16",
"111868_16", "111868_16", "111868_16"), segmentid = c("111868_16",
"111868_16", "111868_16", "111868_16", "111868_16", "111868_16"
), mu.x = c(-4446545.25191192, -4446557.10576816, -4446565.77504969,
-4446580.81370994, -4446625.40007808, -4446652.29459533),
mu.y = c(-2305423.86124176, -2305461.88537725, -2305489.69364377,
-2305537.93137917, -2305680.93056743, -2305767.17264774),
lon = c(-39.9439956132156, -39.944102098218, -39.944179975699,
-39.9443150702825, -39.9447155964422, -39.9449571940013),
lat = c(-20.3985940756941, -20.3989161274532, -20.3991516537744,
-20.3995602097098, -20.4007713539709, -20.4015017842338),
lq_closest_filt = c(7L, 7L, 7L, 7L, 7L, 7L), dt_closest_filt = c(0.0516666666666667,
0.0702777777777778, 0.0838888888888889, 0.1075, 0.1775, 0.219722222222222
), dist_closest_filt = c(0.103680210832692, 0.141026573116106,
0.168339162761167, 0.215717097671267, 0.356168027785347,
0.440874049523752), rel.angle = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), speed = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), depth_bin = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("(0,50]", "(50,100]", "(100,150]",
"(150,200]", "(200,250]", "(250,300]", "(300,350]", "(350,400]",
"(400,450]", "(450,500]", "(500,550]", "(550,600]", "(600,650]",
"(650,700]"), class = "factor"), bat = structure(list(depth = c(-59L,
-59L, -59L, -59L, -59L, -59L)), row.names = c(NA, 6L), class = "data.frame"),
area = c("AR", "AR", "AR", "AR", "AR", "AR")), row.names = c(NA,
6L), class = "data.frame")
Someone know how fix this? Thanks!
It sounds like you might want a few rules employed to decide which rows with AM
become AR
.
AM
is < 20AA
One approach would be to add columns related to these two rules, using rle
. One column would have lengths
for number of consecutive values within a repeated sequence. The other column would have the "next" area. This would be relevant to decide if the destination is back to breeding area, or onwards to a feeding area.
Finally, you can use a conditional statement and change those rows with AM
to AR
that meet these criteria:
area
is AM
area
afterwards is not AA
Here is the code:
df_rle <- rle(df$area)
df2 <- cbind(df, next_area = with(df_rle, rep(c(values[-1], NA), lengths)),
count = with(df_rle, rep(lengths, lengths)))
df2$area <- ifelse(with(df2, area == "AM" & next_area != "AA" & count < 20),
"AR", df2$area)