pythonluascrapyscrapy-splashsplash-js-render

Escape splash:select query selector within lua code


I can't figure out the correct way to escape the periods in splash:select

I have a splash request in scrapy that is using lua to wait for a specific element. This element is an id with periods in the id. I can't seem to correctly escape the periods. I have tried both single and double backslashes (\ and \\)

lua_script = '''
    function main(splash)
        splash:set_user_agent(splash.args.ua)
        assert(splash:go(splash.args.url))
        while not splash:select('div#some.id.here') do
            splash:wait(0.1)
        end
        return {html=splash:html()}
    end
'''

Expected result is the fully loaded html from the requested page

Actual result is:

WARNING: Bad request to Splash: {'description': 'Error happened while executing Lua script', 'error': 400, 'type': 'ScriptError', 'info': {'error': "invalid escape sequence near '\\.'", 'source': '[string "..."]', 'message': '[string "..."]:5: invalid escape sequence near \'\\.\'', 'line_number': 5, 'type': 'LUA_INIT_ERROR'}}

When using \ or \\

If I try escape the string inside the splash:select as such:

splash:select(\'div#some.id.here\')

The code continuously runs (I believe this is a step in the right direction, but I think at this stage the code runs correctly but it's trying to find a multi-classed div instead of a div with the ID containing periods)


Solution

  • You're having a Python string that contains Lua code.

    'splash:select(\'div#some.id.here\')'
    

    Your splash stuff needs you to escape the .

    So we need to prepend a backslash.

    In order to avoid an invalid escape sequence \. error in Lua we have to escape the backslash by prepending another backslash. \\.

    As we're still in a Pyhton string we have to escape the 2 backslashes again. Resulting in a total of four backslashs.

    'splash:select(\'div#some\\\\.id\\\\.here\')'
    

    Python '\\\\.' will be interpreteded as '\\.' by Lua which will end up as '\.' in your splash:select call

    I hope this makes sense. I cannot test it.