The context is springerlink. For example this series of books GTM
I want to get the information located at the bottom of each book's webpage:
All I want is the E-ISBN information on each page.
Is there's a way(not limited to selenium) that enumerate each book page and get the information?
For this easy task you can use both Selenium and BeautifulSoup, but the latter is easier and faster so let's use it to get title and E-ISBN codes.
First install BeautifulSoup with the command pip install beautifulsoup4
.
Notice that in the books list for each book there is an eBook link, which is something like https://www.springer.com/book/9783031256325
where 9783031256325
is the EISBN code without the -
characters.
So we can get the EISBN codes directly from those urls, without the need to load a new page for each book:
import requests
from bs4 import BeautifulSoup
url = 'https://www.springer.com/series/136/books'
soup = BeautifulSoup(requests.get(url).text, "html.parser")
titles = [title.text.strip() for title in soup.select('.c-card__title')]
EISBN = []
for a in soup.select('ul:last-child .c-meta__item:last-child a'):
c = a['href'].split('/')[-1] # a['href'] is something like https://www.springer.com/book/9783031256325
EISBN.append( f'{c[:3]}-{c[3]}-{c[4:7]}-{c[7:12]}-{c[-1]}' ) # insert four '-' in the number 9783031256325 to create the E-ISBN code
for i in range(len(titles)):
print(EISBN[i],titles[i])
Output
978-3-031-25632-5 Random Walks on Infinite Groups
978-3-031-19707-9 Drinfeld Modules
978-3-031-13379-4 Partial Differential Equations
978-3-031-00943-3 Stationary Processes and Discrete Parameter Markov Processes
978-3-031-14205-5 Measure Theory, Probability, and Stochastic Processes
978-3-030-56694-4 Quaternion Algebras
978-3-030-73839-6 Mathematical Logic
978-3-030-71250-1 Lessons in Enumerative Combinatorics
978-3-030-35118-2 Basic Representation Theory of Algebras
978-3-030-59242-4 Ergodic Dynamics
This method load the details page for each book and extract from there the EISBN code:
soup = BeautifulSoup(requests.get(url).text, "html.parser")
books = soup.select('a[data-track-label^="article"]')
titles, EISBN = [], []
for book in books:
titles.append(book.text.strip())
soup_book = BeautifulSoup(requests.get(book['href']).text, "html.parser")
EISBN.append( soup_book.select('p:has(span[data-test=electronic_isbn_publication_date]) .c-bibliographic-information__value')[0].text )
If you are wondering p:has(span[data-test=electronic_isbn_publication_date])
select the parent p
of the span
having attribute data-test=electronic_isbn_publication_date
.