pythonweb-scrapingsocket.io

Connect to socket.io xhr request with python


I'm trying to retrieve some data from here, namely games and odds. I know the data is in the response of this GET request as shown in the network tab below:

network tab showing the request

However we can see that there is some websocket protocol and I'm not sure how to handle this.

I should mention I'm new to python (usually coding in R) and websockets but I've managed to find the socketio path in the code elements so here is what I've tried :

import socketio

sio = socketio.Client(logger=True, engineio_logger=True)

@sio.event
def connect():
  print('connected!')  
  sio.emit('add user', 'Testing')
  
@sio.event
def print_message(sid):
    print("Socket ID: " , sid)

@sio.event
def disconnect():
  print('disconnected!')
  
sio.connect('https://sports-eu-west-3.winamax.fr',transports=['websocket'], socketio_path = '/uof-sports-server/socket.io')
  sio.wait()

I'm able to connect but I'm not sure where to go next and get the actual response from the GET request above.

Any hints appreciated


Solution

  • I believe you were quite close, just need to emit events that are also known by the other side. Most of the data exchange there goes through "m" events.

    I didn't test with current socketio, but according to Version compatibility table we should use v4.x here. Target Socket.IO version is probably v2.5.0, guessed from the header of bundled uof-sports-server/socket.io/socket.io.js

    # /// script
    # requires-python = ">=3.10"
    # dependencies = [
    #     "python-socketio[client]<5.0",
    # ]
    # ///
    import socketio
    import pprint
    import uuid
    
    sio = socketio.Client(
        logger=True,
        # engineio_logger=True
    )
    requestId = str(uuid.uuid4())
    
    # connect & emit "m" event
    @sio.event
    def connect():
        print("connected!")
        data = dict(route="tournament:4", requestId=requestId)
        print("sending", data)
        sio.emit("m", data)
    
    # wait for "m" event with matching requestId
    @sio.on("m")
    def m_response(data):
        if data.get("requestId") == requestId:
            pprint.pp(data.keys())
            pprint.pp([match["title"] for match in data["matches"].values()])
        sio.disconnect()
    
    @sio.event
    def disconnect():
        print("disconnected!")
    
    sio.connect(
        url="https://sports-eu-west-3.winamax.fr",
        transports=["websocket"],
        socketio_path="/uof-sports-server/socket.io/",
    )
    sio.wait()
    

    ( you can use uv to resolve dependencies from script's inline metadata )

    $ uv run winamax_socketio.py
    Engine.IO connection established
    Namespace / is connected
    connected!
    sending {'route': 'tournament:4', 'requestId': '6e9ee3d4-0bcb-45f4-ab6c-652379f234cb'}
    Emitting event "m" [/]
    Received event "m" [/]
    dict_keys(['tournaments', 'matches', 'bets', 'outcomes', 'odds', 'requestId'])
    ['Angers - Rennes',
     'Auxerre - Montpellier',
     'Le Havre - Nantes',
     'Strasbourg - Lyon',
     'Toulouse - Brest',
     'Saint-Étienne - Paris SG',
     'Reims - Marseille',
     'Lille - Lens',
     'Monaco - Nice',
     'Marseille - Toulouse',
     'Nice - Nantes',
     'Brest - Monaco',
     'Montpellier - Le Havre',
     'Lyon - Lille',
     'Lens - Saint-Étienne',
     'Paris SG - Angers',
     'Reims - Strasbourg',
     'Rennes - Auxerre',
     "Ligue 1 McDonald's® 2024/25"]
    Engine.IO connection dropped
    

    To help with such tasks and to check communication flows against know working examples you might want to look into debugging proxies (mitmproxy, Telerik Fiddler, HTTP Toolkit, ...).