rdplyr

R - greatest common divisor dplyr routine


I need to find the greatest common divisor (gcd) for a set of durations: dur.

My data look like this

            actrec dur
1  c Personal Care 120
2      c Free Time  10
3      c Free Time  70
4      c Free Time  40
5         b Unpaid  10
6      c Free Time  20
7  c Personal Care  30
8      c Free Time  40
9      c Free Time  40
10     c Free Time  10 

I am using the function gcd of the schoolmath library. I am looping through my data and store the values in the vector v. Finally, I use the min of v to find the gcd of my data.

library(schoolmath) 

l = length(dt$dur) 
v = array(0, l)

for(i in 2:l){
  v[i] = gcd(dt$dur[i], dt$dur[i-1]) 
}

minV = min(v[-1]) 
minV

Which gives 10.

However, I have trouble translating this routine into dplyr.

I thought of something like (lag for loop).

dt %>% mutate(gcd(dur, lag(dur, 0))) 

But it isn't working. And I am unsure how to insert min.

Any clue ?


Solution

  • We can use rowwise to apply the gcd function on each row after taking the lag of 'dur, extract the 'new1' and get the min

    dt %>%
       mutate(dur1 = lag(dur, default = dur[1])) %>% 
       rowwise() %>% 
       mutate(new1 = gcd(dur, dur1)) %>% 
      .$new1 %>% 
       tail(.,-1) %>% 
       min
    #[1] 10
    

    Or we create a Vectorized function of 'gcd' and apply on the 'dur' column

     gcdV <- Vectorize(function(x,y) gcd(x, y))
     dt %>%
       mutate(new1 = gcdV(dur, lag(dur, default = dur[1])))
    

    and get the min as in the above solution.