pythonpython-2.7cefpython

Need to get HTML source as string CEFPython


I am trying to get HTML source as string from web URL using CEFPython I want MainFrame's source content to be crawled and get string in

def save_screenshot(browser):    
    # Browser object provides GetUserData/SetUserData methods
    # for storing custom data associated with browser. The
    # "OnPaint.buffer_string" data is set in RenderHandler.OnPaint.
    buffer_string = browser.GetUserData("OnPaint.buffer_string")
    if not buffer_string:
        raise Exception("buffer_string is empty, OnPaint never called?")
    mainFrame = browser.GetMainFrame()
    print("Main frame is ", mainFrame)
    # print("buffer string" ,buffer_string)

    # visitor object
    visitorObj = cef_string()
    temp = mainFrame.GetSource(visitorObj).GetString()
    print("temp : ", temp)

    visitorText = mainFrame.GetText(temp)
    siteHTML = mainFrame.GetSource(visitorText)
    print("siteHTML is ", siteHTML)

Problem: The code is returning nothing for siteHTML


Solution

  • Your mainframe.GetSource(visitor) is asynchronous. Therefore you cannot call GetString() from it.

    This is the way to do, unfortunately you need to think in asynchronous manner:

    class Visitor(object)
        def Visit(self, value):
            print("This is the HTML source:")
            print(value)
    myvisitor = Visitor()
    mainFrame = browser.GetMainFrame()
    mainFrame.GetSource(myvisitor)
    

    One more thing to beware of: the visitor object myvisitor in the above example is passed on to GetSource() in weak reference. In other words, you must keep that object alive until the source is passed back. If you put the last three lines in the above snippet in a function, you have to make sure the function does not return until the job is done.