rformatgtfsgtfstools

Using R to convert GTFS spatial data from character to numeric


I am following a vignette for gtfstools (https://cran.r-project.org/web/packages/gtfstools/vignettes/gtfstools.html) but am getting stuck with the data format. Basically, I am linking to a gtfs dataset, which is a zip folder with .txt files inside it.

ART2019Path <- file.path(GTFS_path, "2019-10 Arlington.zip")
ART2019GTFS <- read_gtfs(ART2019Path) 

Here is the data: https://realtime.commuterpage.com/rtt/public/utility/gtfs.aspx

The data loads fine but it is automatically read as all characters. I need most of the data to be numeric for my data analysis purposes. For example, showing transit geometry:

trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
plot(trip_geom$geometry)

I tried mutating all data, assuming data without numbers would stay as characters, but it didn't work:

ART2019GTFS <- mutate_all(ART2019GTFS, funs(as.numeric))

I am relatively new to R so not sure how to tackle this.

Any help figuring this out would be appreciated.


Solution

  • When I follow that link I get a zip file named google_transit.zip which has several comma separated text files in it. When I runthis:

    ART2019GTFS <- read_gtfs("~/google_transit.zip") 
    

    I get this (one dataframe for each text file):

    > str(ART2019GTFS)
    List of 8
     $ agency        :Classes ‘data.table’ and 'data.frame':    1 obs. of  6 variables:
      ..$ agency_id      : chr "1"
      ..$ agency_name    : chr "Arlington Transit"
      ..$ agency_url     : chr "http://www.arlingtontransit.com"
      ..$ agency_phone   : chr "703-228-7433"
      ..$ agency_timezone: chr "America/New_York"
      ..$ agency_lang    : chr "en"
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ calendar      :Classes ‘data.table’ and 'data.frame':    5 obs. of  10 variables:
      ..$ service_id: chr [1:5] "1" "2" "3" "4" ...
      ..$ monday    : int [1:5] 1 0 1 0 0
      ..$ tuesday   : int [1:5] 1 0 1 0 0
      ..$ wednesday : int [1:5] 1 0 1 0 0
      ..$ thursday  : int [1:5] 1 0 1 0 0
      ..$ friday    : int [1:5] 0 1 1 0 0
      ..$ saturday  : int [1:5] 0 0 0 1 0
      ..$ sunday    : int [1:5] 0 0 0 0 1
      ..$ start_date: Date[1:5], format: "2022-03-27" "2022-03-27" "2022-03-27" ...
      ..$ end_date  : Date[1:5], format: "2023-12-31" "2023-12-31" "2023-12-31" ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ calendar_dates:Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
      ..$ service_id    : chr [1:3] "1" "3" "5"
      ..$ date          : Date[1:3], format: "2022-05-30" "2022-05-30" "2022-05-30"
      ..$ exception_type: int [1:3] 2 2 1
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ routes        :Classes ‘data.table’ and 'data.frame':    21 obs. of  8 variables:
      ..$ route_id        : chr [1:21] "41" "42" "43" "45" ...
      ..$ agency_id       : chr [1:21] "1" "1" "1" "1" ...
      ..$ route_short_name: chr [1:21] "41" "42" "43" "45" ...
      ..$ route_long_name : chr [1:21] "Columbia Pike-Ballston-Court House" "Ballston-Pentagon" "Crystal City-Courthouse" "Columbia Pike-DHS/Sequoia-Rosslyn" ...
      ..$ route_type      : int [1:21] 3 3 3 3 3 3 3 3 3 3 ...
      ..$ route_color     : chr [1:21] "DCC154" "D7171F" "BC1B8D" "0084CA" ...
      ..$ route_text_color: chr [1:21] "FFFFFF" "FFFFFF" "FFFFFF" "FFFFFF" ...
      ..$ route_url       : chr [1:21] "https://www.arlingtontransit.com/routes-schedules/art-41/" "https://www.arlingtontransit.com/routes-schedules/art-42/" "https://www.arlingtontransit.com/routes-schedules/art-43/" "https://www.arlingtontransit.com/routes-schedules/art-45/" ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ shapes        :Classes ‘data.table’ and 'data.frame':    10721 obs. of  4 variables:
      ..$ shape_id         : chr [1:10721] "9" "9" "9" "9" ...
      ..$ shape_pt_lon     : num [1:10721] -77.1 -77.1 -77.1 -77.1 -77.1 ...
      ..$ shape_pt_lat     : num [1:10721] 38.9 38.9 38.9 38.9 38.9 ...
      ..$ shape_pt_sequence: int [1:10721] 1 2 3 4 5 6 7 8 9 10 ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ stop_times    :Classes ‘data.table’ and 'data.frame':    57711 obs. of  7 variables:
      ..$ trip_id       : chr [1:57711] "1" "1" "1" "1" ...
      ..$ arrival_time  : chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
      ..$ departure_time: chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
      ..$ stop_id       : chr [1:57711] "138" "141" "867" "144" ...
      ..$ stop_sequence : int [1:57711] 1 2 3 4 5 6 7 8 9 10 ...
      ..$ stop_headsign : chr [1:57711] "" "" "" "" ...
      ..$ timepoint     : int [1:57711] 1 0 0 1 0 0 0 0 0 0 ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ stops         :Classes ‘data.table’ and 'data.frame':    640 obs. of  6 variables:
      ..$ stop_id  : chr [1:640] "83" "85" "87" "89" ...
      ..$ stop_code: chr [1:640] "51001" "51003" "51005" "51007" ...
      ..$ stop_name: chr [1:640] "Ballston Metro G, Fairfax Dr, EB @ N Stafford, NS" "Fairfax Drive, WB @ N Utah Street, FS" "16th Street N, WB @ N Glebe Road, FS" "16th Street N, WB @ N Buchanan Street, NS" ...
      ..$ stop_lat : num [1:640] 38.9 38.9 38.9 38.9 38.9 ...
      ..$ stop_lon : num [1:640] -77.1 -77.1 -77.1 -77.1 -77.1 ...
      ..$ stop_url : chr [1:640] "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51001#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51003#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51005#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51007#realTimeResultsContainer" ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ trips         :Classes ‘data.table’ and 'data.frame':    2296 obs. of  7 variables:
      ..$ route_id     : chr [1:2296] "52" "52" "52" "52" ...
      ..$ service_id   : chr [1:2296] "3" "3" "3" "3" ...
      ..$ trip_id      : chr [1:2296] "1" "2" "3" "4" ...
      ..$ trip_headsign: chr [1:2296] "Ballston Metro" "Ballston Metro" "Ballston Metro" "Ballston Metro" ...
      ..$ direction_id : int [1:2296] 0 0 0 0 0 1 1 1 1 1 ...
      ..$ block_id     : chr [1:2296] "5202" "5202" "5202" "5202" ...
      ..$ shape_id     : chr [1:2296] "76" "76" "76" "76" ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     - attr(*, "class")= chr [1:3] "dt_gtfs" "gtfs" "list"
    

    And then this apparently succeeds:

    > trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
    > str(trip_geom)
    Classes ‘sf’, ‘data.table’ and 'data.frame':    2296 obs. of  3 variables:
     $ trip_id    : chr  "1" "2" "3" "4" ...
     $ origin_file: chr  "shapes" "shapes" "shapes" "shapes" ...
     $ geometry   :sfc_LINESTRING of length 2296; first list element:  'XY' num [1:131, 1:2] -77.2 -77.2 -77.2 -77.2 -77.2 ...
     - attr(*, "sf_column")= chr "geometry"
     - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
      ..- attr(*, "names")= chr [1:2] "trip_id" "origin_file"