rstreet-address

normal_address() function in R not working as expected


The normal_address() function from the campfin package is not working as I'd expect it to.

I'm trying to use a piece of code like this:

df <- df %>% mutate(clean_add = normal_address(RESERVATION_ADDRESS, abbs=usps_street))

I'm expecting all the words contained in usps_street$full to get replace with it's abbreviation. It does it most of the time, but not every time.

Is this just a bug with normal_address() or am I missing something? It is causing addresses to not match when I attempt fuzzy matching in a step later one (even though when I look at them they're clearly the same).

Below are some addresses I haven't been able to get normalized correctly:

structure(list(RESERVATION_ADDRESS = c("4620 ASH GROVE DRIVE #3B", 
"4001 DE MORADA DRIVE UNIT 118", "734 THOMPSON DRIVE, UNIT A", 
"5917 YORK BRIDGE CIRCLE, AUSTIN, TX", "4140 SUNLAND CIRCLE NW", 
"3951 BELLAIRE DRIVE SOUTH"), RESERVATION_CITY = c("SPRINGFIELD", 
"ODESSA", "LAKE DALLAS", "AUSTIN", "ALBUQUERQUE", "FORT WORTH"
), RESERVATION_STATE = c("IL", "TX", "TX", "TX", "NM", "TX"), 
    RESERVATION_ZIPCODE = c(62711, 79765, 75065, 78749, 87107, 
    76109)), row.names = c(NA, 6L), class = "data.frame")

I'm trying to avoid having to utilize something like `gsub("CIRCLE", "CIR", clean_add) because there could be more instances I'm missing other than "CIRCLE" or "DRIVE".

Is there a better function out there to do this? Or am I just missing something?


Solution

  • Current:

    > tt$RESERVATION_ADDRESS
    [1] "4620 ASH GROVE DRIVE #3B"            "4001 DE MORADA DRIVE UNIT 118"      
    [3] "734 THOMPSON DRIVE, UNIT A"          "5917 YORK BRIDGE CIRCLE, AUSTIN, TX"
    [5] "4140 SUNLAND CIRCLE NW"              "3951 BELLAIRE DRIVE SOUTH"   
    

    Probably disered output:

    > library(campfin)
    > normal_address(tt$RESERVATION_ADDRESS, abbs = usps_street, abb_end = FALSE)
    [1] "4620 ASH GRV DR #3B"         "4001 DE MORADA DR UNIT 118"  "734 THOMPSON DR UNIT A"     
    [4] "5917 YORK BRG CIR AUSTIN TX" "4140 SUNLAND CIR NW"         "3951 BELLAIRE DR S" 
    

    Meaning, you need to specify abb_end = FALSE, and normal_address() works as expected. If so, then change to:

    library(dplyr)
    library(campfin)
    df = 
      df |> 
      mutate(clean_add = normal_address(RESERVATION_ADDRESS, abbs = usps_street, abb_end = FALSE))
    

    Data:

    tt = structure(list(RESERVATION_ADDRESS = c("4620 ASH GROVE DRIVE #3B", 
                                                "4001 DE MORADA DRIVE UNIT 118", "734 THOMPSON DRIVE, UNIT A", 
                                                "5917 YORK BRIDGE CIRCLE, AUSTIN, TX", "4140 SUNLAND CIRCLE NW", 
                                                "3951 BELLAIRE DRIVE SOUTH"), RESERVATION_CITY = c("SPRINGFIELD", 
                                                                                                   "ODESSA", "LAKE DALLAS", "AUSTIN", "ALBUQUERQUE", "FORT WORTH"
                                                ), RESERVATION_STATE = c("IL", "TX", "TX", "TX", "NM", "TX"), 
                        RESERVATION_ZIPCODE = c(62711, 79765, 75065, 78749, 87107, 
                                                76109)), row.names = c(NA, 6L), class = "data.frame")