rconditional-statementsrepeatbreakpointsconditional-breakpoint

Create new column with condition by repeat values and when have break points


My data is about 40 animals (ids) with locations by telemetry, and I have already stipulated 3 areas. The first one is AR, where is breeding area, AM migration, and AA is feeding area. The first locations of all animals is at AR. But sometimes the animal is in breeding period yet (at AR), but can go out to AM few times, and then came back to AR. Only when the animal have only AM they started the migration, until arrive at feeding area AA. So, they start at AR, then start the migration AM and then arrive at the feeding area AA.

Im trying to create a new column with some conditions that I dont know yet how to do, I have this data frame for example

id     area   
2304   AR
2304   AR
2304   AR
2304   AM  #this AM for example, can repeat until 20 times and then came back to AR
2304   AM
2304   AR
2304   AR
2304   AR
2304   AM
2304   AM
2304   AM
2304   AM
2304   ...
2304   AM
2304   AM
2304   AM
2304   AA
2304   AA
2304   ...
2304   AA

So, when have AR x times and after this have one or until 20 AM and came back have AR, I want a new column with AR. By the moment when have AM x times and only AM, without come back to AR, I want new column with AM. Like this:

And AA its ok, AA = AA always

I expected this:

id    area    fixed_area
2304   AR      AR
2304   AR      AR
2304   AR      AR
2304   AM      AR  #this AM for example, can repeat until 20 times and then came back to AR
2304   AM      AR
2304   AR      AR
2304   AR      AR
2304   AR      AR
2304   AM      AM
2304   AM      AM
2304   AM      AM
2304   AM      AM
2304   ...     ...
2304   AM      AM
2304   AM      AM
2304   AM      AM
2304   AA      AA
2304   AA      AA
2304   ...    ...
2304   AA      AA

I tryed this:

but the AA is missing, maybe the problem is because need do this separation per animal (id)

> table(df$area)

   AA    AM    AR 
31460 39101 28820 

class(df$area)
[1] "character"
> idx <- with(rle(as.character(df$area)), rep(seq_along(lengths), lengths))
> df$fixed_area <- with(df, replace(area, idx < max(idx[area == 'AM']), 'AR'))
> table(df$fixed_area)

   AM    AR 
  145 99236 
> 

After this I dput the data frame, but my data frame have more than 90.000 rows, so I copied just head values

> dput(head(df))
structure(list(DeployID = c("111868_16", "111868_16", "111868_16", 
"111868_16", "111868_16", "111868_16"), Start = structure(c(1477323868, 
1477323946, 1477324002, 1477324044, 1477324260, 1477324480), class = c("POSIXct", 
"POSIXt"), tzone = "GMT"), End = structure(c(1477323944, 1477324000, 
1477324042, 1477324170, 1477324458, 1477324542), class = c("POSIXct", 
"POSIXt"), tzone = "GMT"), What = structure(c(1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("Dive", "Message", "Surface"), class = "factor"), 
    Shape = structure(c(2L, 4L, 3L, 2L, 2L, 2L), .Label = c("", 
    "Square", "U", "V"), class = "factor"), DepthMean = c(14.5, 
    16.5, 13, 14.5, 11, 12.5), DurationMean = c(76, 54, 40, 126, 
    198, 62), DepthMin = c(14.5, 16.5, 13, 14.5, 11, 12.5), DepthMax = c(14.5, 
    16.5, 13, 14.5, 11, 12.5), depth_range = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L), .Label = c("shallow", "deep"), class = c("ordered", 
    "factor")), MidTime = structure(c(1477323906, 1477323973, 
    1477324022, 1477324107, 1477324359, 1477324511), class = c("POSIXct", 
    "POSIXt"), tzone = "GMT"), year = c(2016, 2016, 2016, 2016, 
    2016, 2016), id = c("111868_16", "111868_16", "111868_16", 
    "111868_16", "111868_16", "111868_16"), segmentid = c("111868_16", 
    "111868_16", "111868_16", "111868_16", "111868_16", "111868_16"
    ), mu.x = c(-4446545.25191192, -4446557.10576816, -4446565.77504969, 
    -4446580.81370994, -4446625.40007808, -4446652.29459533), 
    mu.y = c(-2305423.86124176, -2305461.88537725, -2305489.69364377, 
    -2305537.93137917, -2305680.93056743, -2305767.17264774), 
    lon = c(-39.9439956132156, -39.944102098218, -39.944179975699, 
    -39.9443150702825, -39.9447155964422, -39.9449571940013), 
    lat = c(-20.3985940756941, -20.3989161274532, -20.3991516537744, 
    -20.3995602097098, -20.4007713539709, -20.4015017842338), 
    lq_closest_filt = c(7L, 7L, 7L, 7L, 7L, 7L), dt_closest_filt = c(0.0516666666666667, 
    0.0702777777777778, 0.0838888888888889, 0.1075, 0.1775, 0.219722222222222
    ), dist_closest_filt = c(0.103680210832692, 0.141026573116106, 
    0.168339162761167, 0.215717097671267, 0.356168027785347, 
    0.440874049523752), rel.angle = c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), speed = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), depth_bin = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L), .Label = c("(0,50]", "(50,100]", "(100,150]", 
    "(150,200]", "(200,250]", "(250,300]", "(300,350]", "(350,400]", 
    "(400,450]", "(450,500]", "(500,550]", "(550,600]", "(600,650]", 
    "(650,700]"), class = "factor"), bat = structure(list(depth = c(-59L, 
    -59L, -59L, -59L, -59L, -59L)), row.names = c(NA, 6L), class = "data.frame"), 
    area = c("AR", "AR", "AR", "AR", "AR", "AR")), row.names = c(NA, 
6L), class = "data.frame") 

Someone know how fix this? Thanks!


Solution

  • It sounds like you might want a few rules employed to decide which rows with AM become AR.

    One approach would be to add columns related to these two rules, using rle. One column would have lengths for number of consecutive values within a repeated sequence. The other column would have the "next" area. This would be relevant to decide if the destination is back to breeding area, or onwards to a feeding area.

    Finally, you can use a conditional statement and change those rows with AM to AR that meet these criteria:

    Here is the code:

    df_rle <- rle(df$area)
    df2 <- cbind(df, next_area = with(df_rle, rep(c(values[-1], NA), lengths)),
                     count = with(df_rle, rep(lengths, lengths)))
    df2$area <- ifelse(with(df2, area == "AM" & next_area != "AA" & count < 20),
                       "AR", df2$area)