pythonseleniumfor-loopweb-scrapinggoogle-scholar

Web scraping multiple google scholar pages in python


I want to scrape multiple Google scholar user profiles - publications, journals, citations etc. I have already written the python code for scraping a user profile given the url. Now, suppose I have 100 names and the corresponding urls in an excel file like this.

name       link

Autor      https://scholar.google.com/citations?user=cp-8uaAAAAAJ&hl=en
Dorn       https://scholar.google.com/citations?user=w3Dri00AAAAJ&hl=en
Hanson     https://scholar.google.com/citations?user=nMtHiQsAAAAJ&hl=en
Borjas     https://scholar.google.com/citations?user=Patm-BEAAAAJ&hl=en
....

My question is can I read the 'link' column of this file and write a for loop for the urls so that I can scrape each of these profiles and append the results in the same file. I seems a bit far fetched but I hope there is a way to do so. Thanks in advance!


Solution

  • You can use pandas.read_csv() to read a specific file from a csv. For example:

    import pandas as pd
    
    df = pd.read_csv('data.csv')
    arr = []
    link_col = df['link']
    for i in link_col:
        arr.append(i);
       
    print(arr)
    

    This would allow you extract only the link column and append each value into your array. If you'd like you learn more, you can refer to pandas.