pythonseleniumbeautifulsoupgrequests

Extracting links from website with selenium bs4 and python


Okay so.

The heading might seem like this question has already been asked but I had no luck finding an answer for it.

I need help with making link extracting program with python.

Actually It works. It finds all <a> elements on a webpage. Takes their href="" and puts it in an array. Then it exports it in csv file. Which is what I want.

But I can't get a hold of one thing.

The website is dynamic so I am using the Selenium webdriver to get JavaScript results.

The code for the program is pretty simple. I open a website with webdriver and then get its content. Then I get all links with

results = driver.find_elements_by_tag_name('a')

Then I loop through results with for loop and get href with

result.get_attribute("href")

I store results in an array and then print them out.

But the problem is that I can't get the name of the links.

<a href="https://www.google.com">This leads to Google</a>

Is there any way to get 'This leads to Google' string.

I need it for every link that is stored in an array.

Thank you for your time

UPDATE!!!!!

As it seems it only gets dynamic links. I just notice this. This is really strange now. For hard coded items, it returns an empty string. For a dynamic link, it returns its name.


Solution

  • Okay. So. The answer is that instad of using .text you shoud use get_attribute("textContent"). Works better than get_attribute("innerHTML")

    Thanks KunduK for this answer. You saved my day :)