im currently trying to scrape a google sheet. But I have serious troubles because I cant just use requests to get the page data because only the first 100 rows are loaded. So I tried to use selenium but google has pretty strong anti scrape techniques. I cant get good data out of it. If someone knows a different aproach or a way to fix the selenium method let me know.
You can transform the URL of the google sheet (to have export in Json format). Then use json
/pandas
to load the data into a Dataframe:
import json
import requests
import pandas as pd
# transform the URL to this:
url = 'https://docs.google.com/spreadsheets/d/1DAQ__r9RNx4mHgcDoXvzda9-0d_yeP7hqXdRUZ67EtM/gviz/tq?tqx=out:json'
data = requests.get(url).text
data = data[data.index('{'):-2]
data = json.loads(data)['table']
# print(json.dumps(data['table'], indent=4))
df = pd.DataFrame([[c['v'] if c else None for c in r['c']] for r in data['rows']], columns=None)[range(4)]
df.columns = ['Item', 'Link', 'Price (Yuan)', 'Price (USD)']
print(df.head())
print()
print(len(df))
Prints:
Item Link Price (Yuan) Price (USD)
0 NIKE VAPORMAX 2021 https://pandabuy.allapp.link/clvm6i8gpf6qoog017eg ¥178 $27
1 OFF WHITE DUNKS\n(29 COLORWAYS) https://pandabuy.allapp.link/clvm6k0gpf6qoog017fg ¥160 $25
2 OFF WHITE DUNKS\nUNIVERSITY RED https://pandabuy.allapp.link/clvm6nggpf6qoog017gg ¥400 $63
3 OFF WHITE DUNKS\nPINE GREEN https://pandabuy.allapp.link/clvm6q7m8nglvnuo8urg ¥400 $63
4 OFF WHITE LOT 50 DUNKS https://pandabuy.allapp.link/clvm6s90b4mhqi3p2010 ¥180 $28
1319
EDIT: To load images - use beautifulsoup and load the sheet from preview URL
import requests
from io import StringIO
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://docs.google.com/spreadsheets/d/1DAQ__r9RNx4mHgcDoXvzda9-0d_yeP7hqXdRUZ67EtM/preview/sheet?gid=0'
table = BeautifulSoup(requests.get(url).content, 'html.parser').table
# convert all img links to text
for img in table.select('td img[src]'):
img.replace_with(img['src'])
df = pd.read_html(StringIO(str(table)))[0]
print(df.head(10))
print(len(df))
Prints:
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5
0 1 Tiktok Branded Goods CTRL + F to browse for items CTRL + F to browse for items CTRL + F to browse for items
1 2 Sign up to be a pandabuy affiliate Sign up to be a pandabuy affiliate Sign up to be a pandabuy affiliate Sign up to be a pandabuy affiliate Sign up to be a pandabuy affiliate
2 3 SHOES SHOES SHOES SHOES SHOES
3 4 ITEM LINK PRICE (¥) (YUAN) PRICE ($) (USD) PICTURE
4 5 JORDAN 4 KX/KZ/K2 Batch https://pandabuy.allapp.link/clvlhrggpf6qoog012sg ¥280 $41 https://lh7-rt.googleusercontent.com/sheetsz/AHOq17EttQoQ-f8s57IHnPjcgralFtsUA4x23UTYLye_lTx1K4Tfkwi3A2sg8KKrFlVZaQIxLcjOouPJn5FbrVB5NGe7k88hgYlk1WPuk-ZPIFocGJRQQMI7bknBYzTryoIVOgc-k0qMyYBVWf7hmhMiV4s=w243-h128?key=kbwyiYmsDs8YaHncN15rsw
5 6 Best JORDAN 4 BLACK CATS Batch https://pandabuy.allapp.link/clvlhvggpf6qoog012t0 ¥450 $67 https://lh7-rt.googleusercontent.com/sheetsz/AHOq17FEusJVTfIr5gAv8gMh3BJEsHN6zjjJaYejr_bWNBJMUyOZcaCUhUm2dQNS2EM9--8dZ8gHaKyGj0G3k4hAwSkba21x9oO1u_QiOXnGi7o94u-VQD2-PkjhS66MFjpBsKn4mh5v74m53-9K3c1a-cM=w243-h128?key=kbwyiYmsDs8YaHncN15rsw
6 7 ARCHEO DUNKS https://pandabuy.allapp.link/clvlihggpf6qoog01340 ¥351 $53 https://lh7-rt.googleusercontent.com/sheetsz/AHOq17H-O6FAFrpIO6CKerzyeArQYyOXdUdeRO_CddZpmgbAbaHJfPmtzf64NheeENxvAf2mL1m88N3BF7jnPn_4POuNUXHCvkHkZFeHPqbiZSDvKv-j8Ch4e7hpCkzKGuOzyst_J9I1RdSItfoSJqhhNCvP=w243-h129?key=kbwyiYmsDs8YaHncN15rsw
7 8 BRAZIL DUNKS https://pandabuy.allapp.link/clvljbnm8nglvnuo8r60 ¥351 $53 https://lh7-rt.googleusercontent.com/sheetsz/AHOq17FI1GveGB7zouQhT0bo1OWrbEB26LDxsPMhfgF8zXXcZ_yGY1r3POWXs2oRFbFrKiyVSlUw9nLOC7D_WXnnaEigos-rqVIrgBhv9hymIXMf04ZOoZFTIbUD-G995q1bU9F9NzUzUDDDoc2HjRn8mF4=w243-h129?key=kbwyiYmsDs8YaHncN15rsw
8 9 DARK MARINA BLUE DUNKS https://pandabuy.allapp.link/clvljgggpf6qoog013e0 ¥351 $53 https://lh7-rt.googleusercontent.com/sheetsz/AHOq17FpjYzZmQsX04TQk4Sb6ohGafMgEahkwa4NCgVGadHqulSaWxp5iW2MC3eI0mA0JSQq0xDgO8mpq1lymma_D3h-Ma6y8vUxsjIc8_n_RhIhwKCdOl7j-7WDBvYSxoZVax4Y99KPEy95gHy89rqhKXCb=w243-h129?key=kbwyiYmsDs8YaHncN15rsw
9 10 UNIVERSITY BLUE DUNKS https://pandabuy.allapp.link/clvlna8gpf6qoog01480 ¥180 $26 https://lh7-rt.googleusercontent.com/sheetsz/AHOq17GDYMrm21e0rl4AOLz0204Wd80Xj5bjTfjwa1eHosvQRveP7thUWfs9qc2k95Mpln7Qghju1_uNR_2hLmOAa9EHPacupB0tKWKXy1nH7IFdV7uRcN5bVHSusMNfkMZz2KAq9xAiJhGRY_lfGeK4OqMf=w243-h129?key=kbwyiYmsDs8YaHncN15rsw
1471