pythonbeautifulsoupcanvas-lms

Extract iframes using BeautifulSoup with Python


I use the Canvas LMS and I want to extract the iframe from some pages to change the src content. I try the following:

//some code
soup = BeautifulSoup(page_html, 'html.parser')
pretty_html = soup.prettify()
soup = BeautifulSoup(pretty_html, 'html.parser')
iframe = soup.find('iframe')
print(iframe)

But the result is unexpected, I got this as a result:

None
None
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>
None
None
None
None
None
None

I was expecting only get this

<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>

The page html received has only one iframe, what is wrong with the result? I think I should receive only one iframe object, but it appears that I receive a list. Someone can clarify for me what am I doing wrong?


Solution

  • I discover how to fix the problem.

    I change the code:

    iframe = soup.find('iframe')
    

    to

    iframe = soup.find_all('iframe')
    

    Then, instead of getting None as a response, I begin to receive []. An empty value.

    I tested it using:

    if iframes != [] :
        print( iframes[0]['src'] )
    

    I got the content of src using the iframes[0]['src']