rdata-analysissocrata

Need help pulling JSON data with RSocrata from a website API


I need help drafting code that pulls public data directly from a website that is in Socrata format. Here is a link:

https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-and-Position-Title/xzkq-xp2w

There is an API endpoint:

https://data.cityofchicago.org/resource/xzkq-xp2w.json

After the data is uploaded, null values in the "Annual Salary" should be replaced with 50000.


Solution

  • We can use the RSocrata package

    library(RSocrata)
    url <- "https://data.cityofchicago.org/resource/xzkq-xp2w.json"
    data <- RSocrata::read.socrata(url)
    head(data)
    #                             name                                    job_titles       department full_or_part_time salary_or_hourly annual_salary typical_hours hourly_rate
    #1               AARON,  JEFFERY M                                      SERGEANT           POLICE                 F           Salary        111444          <NA>        <NA>
    #2                  AARON,  KARINA        POLICE OFFICER (ASSIGNED AS DETECTIVE)           POLICE                 F           Salary         94122          <NA>        <NA>
    #3             AARON,  KIMBERLEI R                      CHIEF CONTRACT EXPEDITER             DAIS                 F           Salary        118608          <NA>        <NA>
    #4             ABAD JR,  VICENTE M                             CIVIL ENGINEER IV      WATER MGMNT                 F           Salary        117072          <NA>        <NA>
    #5              ABARCA,  FRANCES J                                POLICE OFFICER           POLICE                 F           Salary         48078          <NA>        <NA>
    

    The following will replace the NAs in annual_salary with 50000.

    data[is.na(data$annual_salary),"annual_salary"] <- 50000
    

    However, if you'd like to do what it suggests on the city of Chicago website, you could consider multipling typical_hours with hourly_rate to estimate salary.

    ind <- is.na(data$annual_salary)
    data[ind,]$annual_salary <- as.numeric(data[ind,]$typical_hours) * as.numeric(data[ind,]$hourly_rate) * 52