pythonpython-3.xpython-requestsopendata

Inexplicable formatting magic on API Endpoint


I'm writing a wrapper for the Deutsche Bahn's Fahrplan OpenData API.

However, I cannot seem to produce the same result as a simple curl request as follows:

>>>import requests
>>>header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
>>>departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)

# Now, using a journey's details id, lets request some journey details from the endpoint
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header)
<Response [404]>
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header).request.url
'https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

Alright, so far, so bad. As you can see I'm using the data as given to me. Now, calling the endpoint via the Website, it tells me it runs this curl command:

curl -X GET --header "Accept: application/json" --header "Authorization: Bearer 36e39957ace6f405a82cfb09522d0a8d" "https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160"

And this bit of magic happens:

the original journey ID

'782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

becomes:

'782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160'

and returns a status 200.

Out of seemingly nowhere, the journey id got some characters added to it. I copy & pasted it into the given field and nothing more, so I know it wasn't me.

I believe there is some sort of encoding/ decoding happening, but I've never seen this before, and honestly don't know what to make of it.

How do I handle this in my code? Clearly I need to do something in addition to simply parsing the departures endpoint? Or, better yet, am I simply missing out on something obvious?

I've sent multiple mails to the DB developers, but so far have not heard from them back.


Solution

  • In v1 of the API, there are four endpoints defined:

    GET /location/{name}
    GET /arrivalBoard/{id}
    GET /departureBoard/{id}
    GET /journeyDetails/{id}
    

    Each of them expects an {id} parameter. The value you give this parameter must be URL-encoded, which is something you neglected to do.

    /departureBoard/{id} gives you a list of Board items, which are defined like so:

    Board {
        name (string): ,
        type (string): ,
        boardId (string): ,
        stopId (string): ,
        stopName (string): ,
        dateTime (string): ,
        origin (string): ,
        track (string): ,
        detailsId (string):
    }
    

    The detailsId is what you can use to hit the /journeyDetails/{id} endpoint. So the minimum working code looks like this (note the call to urllib.parse.quote):

    import requests
    import urllib
    
    header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
    departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)
    
    journey_id = departure_data.json()[0]['detailsId']
    journey_details = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + urllib.parse.quote(journey_id), headers=header)
    

    The value of journey_id is itself URL-encoded and decodes to something that looks like an URL fragment:

    urllib.parse.unquote(journey_id)
    # -> '564552/203236/867650/245641/80?station_evaId=8098160'
    

    So it looks a bit like you could simply use the original value to make further requests, but that's a misconception.

    Treat the ID as an opaque plain text value that you need to encode, like you would encode any other arbitrary value before using it in a URL.

    When you quote the value, the percent signs are escaped by %25, which leads to the longer value:

    '564552%2F203236%2F867650%2F245641%2F80%3fstation_evaId%3D8098160'
    '564552%252F203236%252F867650%252F245641%252F80%253fstation_evaId%253D8098160'
    

    Since the Deutsche Bahn API is self-documenting through Swagger, it might be easiest to install a swagger client let it create an API wrapper for you (see their swagger.json). pyswagger looks usable, but there are others to try.

    This way you could concentrate on making API requests and getting data and the low level plumbing like URL-encoding and even authorization would happen transparently in the background.