I'm using an open source project call OpenTripPlanner which is a tool that I plan to use to simulate a lot of itineraries from one point to another at a given time. So far, I've managed to find the URL where an XML file containing all information about an itineraries is located. The XML is built upon request so the URL isn't static. The URL looks something like this :
(You need to have an OpenTripPlanner server running to open it)
Now, I want to read these XML files and do some data analysis using python 3, but I can't find a way to read the files. I've tried to use urllib.request to download the file locally, but the file that I get from this is oddly formed. It looks something like this
{"requestParameters":{"date":"2017/12/04","mode":"TRANSIT,WALK","fromPlace":"48.40915, -71.04996","toPlace":"48.41428, -71.06996","time":"8:00:00"},"plan":{"date":1512392400000,"from":{"name":"Origin","lon":-71.04996,"lat":48.40915,"orig":"","vertexType":"NORMAL"},"to":{"name":"Destination","lon":-71.06996,"lat":48.41428,"orig":"","vertexType":"NORMAL"},"itineraries":[{"duration":1538,"startTime":1512392809000,"endTime":1512394347000,"walkTime":934,"transitTime":602,"waitingTime":2,"walkDistance":1189.6595112715966,"walkLimitExceeded":false,"elevationLost":0.0,"elevationGained":0.0,"transfers":0,"legs":[{"startTime":1512392809000,"endTime":1512393537000,"departureDelay":0,"arrivalDelay":0,"realTime":false,"distance":926.553,"pathway":false,"mode":"WALK","route":"","agencyTimeZoneOffset":-18000000,"interlineWithPreviousLeg":false,"from":{"name":"Origin","lon":-71.04996,"lat":48.40915,"departure":1512392809000,"orig":"","vertexType":"NORMAL"},"to":{"name":"Roitelets / Martinets","stopId":"1:370","stopCode":"370","lon":-71.047688,"lat":48.401531,"arrival":1512393537000,"departure":1512393538000,"stopIndex":15,"stopSequence":16,"vertexType":"TRANSIT"},"legGeometry":{"points":"s{mfHb{spL|ExBp@sDl@V@@lB|@j@FL?j@GbCk@|A]vEsA^KBA|C{@pCeACS~CuA`@Q","length":19},"rentedBike":false,"transitLeg":false,"duration":728.0,"steps":[{"distance":131.991,"relativeDirection":"DEPART","streetName":"Rue D.-V.-Morrier","absoluteDirection":"SOUTH","stayOn":false,"area":false,"bogusName":false,"lon":-71.04961760502248,"lat":48.4090671692228,"elevation":[]},{"distance":72.319,"relativeDirection":"LEFT","streetName":"Rue Lorenzo-Genest","absoluteDirection":"EAST","stayOn":false,"area":false,"bogusName":false,"lon":-71.0502299,"lat":48.4079519,"elevation":[]}
And when I try to open the file in a browser, I get an error that says
XML Parsing Error: not well-formed
Location: http://localhost:63342/XML_reader/file.xml?_ijt=e1d6h53s4mh1ak94sqortejf9v
Line Number 1, Column 1: ...
The script I'm using is very simple, it looks like this
import urllib.request
testfile = urllib.request.URLopener()
file_name = 'http://localhost:8080/otp/routers/default/plan?fromPlace=48.40915,%20-71.04996&toPlace=48.41428,%20-71.06996&date=2017/12/04&time=8:00:00&mode=TRANSIT,WALK'
testfile.retrieve(file_name, "file.xml")
How can I make the outputted XML files well-formed? Is there an other way besides urllib.request that I may want to try?
Thanks a lot
To import this file as JSON data (not XML) you need the JSON library
import urllib.request
import json
from pprint import pprint
testfile = urllib.request.URLopener()
file_name = 'http://localhost:8080/otp/routers/default/plan?fromPlace=48.40915,%20-71.04996&toPlace=48.41428,%20-71.06996&date=2017/12/04&time=8:00:00&mode=TRANSIT,WALK'
testfile.retrieve(file_name, "file.json")
data = json.load(open('file.json'))
pprint(data)
json.load
reads the JSON data and convert into a Python object (https://docs.python.org/2/library/json.html?highlight=json%20load#json.load)pprint
is for "Pretty printing" the JSON data (https://docs.python.org/2/library/pprint.html)