rtidyversestata

How to do equivalent of Stata's inrange() function in R?


This is similar to What is the equivalent of Stata function inlist() in R?, except about inrange().

In Stata, I can select observations of (numeric) variable var that fall within a certain [x,y] range with inrange(var, x, y). For instance:

keep if inrange(var, 50, 100)

What is the equivalent in R, preferably within the tidyverse?

I know that I can do the following, but is there a faster way?

data %>% filter(var>=50 & var<=100)

Solution

  • To kind of summarize the comments. between in dplyr or data.table is probably the closest thing to Stata’s inrange function. Fortunately the syntax of the function also looks really similar!

    library(palmerpenguins)
    library(dplyr)
    
    penguins |>
      filter(between(body_mass_g, left = 2000, right = 3000)) |>
      head(n = 5)
    #> # A tibble: 5 × 8
    #>   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
    #>   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
    #> 1 Adelie  Dream            37            16.9               185        3000
    #> 2 Adelie  Dream            37.5          18.9               179        2975
    #> 3 Adelie  Biscoe           34.5          18.1               187        2900
    #> 4 Adelie  Biscoe           36.5          16.6               181        2850
    #> 5 Adelie  Biscoe           36.4          17.1               184        2850
    #> # ℹ 2 more variables: sex <fct>, year <int>
    
    ## this can generalise to other popular ways to manipulate data in r 
    
    penguins[between(penguins$body_mass_g, left = 2000, right = 3000),] |>
      head(n = 5)
    #> # A tibble: 5 × 8
    #>   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
    #>   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
    #> 1 <NA>    <NA>             NA            NA                  NA          NA
    #> 2 Adelie  Dream            37            16.9               185        3000
    #> 3 Adelie  Dream            37.5          18.9               179        2975
    #> 4 Adelie  Biscoe           34.5          18.1               187        2900
    #> 5 Adelie  Biscoe           36.5          16.6               181        2850
    #> # ℹ 2 more variables: sex <fct>, year <int>
    

    The data.table version differs a little because it gives you the option to do < and > statements

    library(palmerpenguins)
    library(dplyr)
    
      
    penguins |>
      filter(data.table::between(body_mass_g, lower = 2000, upper = 3000, incbounds = FALSE)) |>
      head(n = 5)
    
    #> # A tibble: 5 × 8
    #>   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
    #>   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
    #> 1 Adelie  Dream            37.5          18.9               179        2975
    #> 2 Adelie  Biscoe           34.5          18.1               187        2900
    #> 3 Adelie  Biscoe           36.5          16.6               181        2850
    #> 4 Adelie  Biscoe           36.4          17.1               184        2850
    #> 5 Adelie  Dream            33.1          16.1               178        2900
    #> # ℹ 2 more variables: sex <fct>, year <int>
    

    Created on 2024-09-09 with reprex v2.1.1