I'm trying to scrape the links to the 400 models listed on this website: https://www.printables.com/model?category=14&fileType=fff&includeUserGcodes=1, which I refer to as webpage in my code below. However, when I run my code, I get no links.
User_agent = {'User-agent': 'Mozilla/5.0 (X11; CrOS i686 4319.74.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36'}
r = requests.get(webpage, headers = User_agent).text
soup = BeautifulSoup(r,'html5lib')
for link in soup.find_all('a'):
print(link['href'])
So I check if links are even available via: print(soup.prettify())
and none of the desired links appear in the HTML view as well. This led me to assume that the website doesn't allow scraping but r.status_code
returns 200 meaning I'm able to scrape.
Is there a different approach I could take? Where else would these links be stored? Thank you.
The data is loaded from external URL via Javascript, so BeautifulSoup doesn't see it. To get info about all items you can use following example:
import json
import requests
url = "https://www.printables.com/graphql/"
payload = {
"operationName": "PrintList",
"query": "query PrintList($limit: Int!, $cursor: String, $categoryId: ID, $materialIds: [Int], $userId: ID, $printerIds: [Int], $licenses: [ID], $ordering: String, $hasModel: Boolean, $filesType: [FilterPrintFilesTypeEnum], $includeUserGcodes: Boolean, $nozzleDiameters: [Float], $weight: IntervalObject, $printDuration: IntervalObject, $publishedDateLimitDays: Int, $featured: Boolean, $featuredNow: Boolean, $usedMaterial: IntervalObject, $hasMake: Boolean, $competitionAwarded: Boolean, $onlyFollowing: Boolean, $collectedByMe: Boolean, $madeByMe: Boolean, $likedByMe: Boolean) {\n morePrints(\n limit: $limit\n cursor: $cursor\n categoryId: $categoryId\n materialIds: $materialIds\n printerIds: $printerIds\n licenses: $licenses\n userId: $userId\n ordering: $ordering\n hasModel: $hasModel\n filesType: $filesType\n nozzleDiameters: $nozzleDiameters\n includeUserGcodes: $includeUserGcodes\n weight: $weight\n printDuration: $printDuration\n publishedDateLimitDays: $publishedDateLimitDays\n featured: $featured\n featuredNow: $featuredNow\n usedMaterial: $usedMaterial\n hasMake: $hasMake\n onlyFollowing: $onlyFollowing\n competitionAwarded: $competitionAwarded\n collectedByMe: $collectedByMe\n madeByMe: $madeByMe\n liked: $likedByMe\n ) {\n cursor\n items {\n ...PrintListFragment\n printer {\n id\n __typename\n }\n user {\n rating\n __typename\n }\n __typename\n }\n __typename\n }\n}\n\nfragment PrintListFragment on PrintType {\n id\n name\n slug\n ratingAvg\n ratingCount\n likesCount\n liked\n datePublished\n dateFeatured\n firstPublish\n downloadCount\n displayCount\n inMyCollections\n foundInUserGcodes\n userGcodeCount\n userGcodesCount\n materials {\n id\n __typename\n }\n category {\n id\n path {\n id\n name\n __typename\n }\n __typename\n }\n modified\n images {\n ...ImageSimpleFragment\n __typename\n }\n filesType\n hasModel\n user {\n ...AvatarUserFragment\n __typename\n }\n ...LatestCompetitionResult\n __typename\n}\n\nfragment AvatarUserFragment on UserType {\n id\n publicUsername\n avatarFilePath\n slug\n badgesProfileLevel {\n profileLevel\n __typename\n }\n __typename\n}\n\nfragment LatestCompetitionResult on PrintType {\n latestCompetitionResult {\n placement\n competitionId\n __typename\n }\n __typename\n}\n\nfragment ImageSimpleFragment on PrintImageType {\n id\n filePath\n rotation\n __typename\n}\n",
"variables": {
"categoryId": "14",
"collectedByMe": False,
"competitionAwarded": False,
"cursor": "",
"featured": False,
"filesType": ["GCODE"],
"hasMake": False,
"includeUserGcodes": True,
"likedByMe": False,
"limit": 36,
"madeByMe": False,
"materialIds": None,
"nozzleDiameters": None,
"ordering": "-first_publish",
"printDuration": None,
"printerIds": None,
"publishedDateLimitDays": None,
"weight": None,
},
}
cnt = 0
while True:
data = requests.post(url, json=payload).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for i in data["data"]["morePrints"]["items"]:
cnt += 1
print(
cnt,
i["name"],
"https://www.printables.com/model/{}-{}".format(i["id"], i["slug"]),
)
if not data["data"]["morePrints"]["cursor"]:
break
payload["variables"]["cursor"] = data["data"]["morePrints"]["cursor"]
Prints:
1 White Spiral Vase https://www.printables.com/model/189114-white-spiral-vase
2 Calibrating Before Battle - 3DPN Mr. Print-It - Superhero Remix https://www.printables.com/model/188733-calibrating-before-battle-3dpn-mr-print-it-superhe
3 twitter 3d bird https://www.printables.com/model/187083-twitter-3d-bird
4 Welcome To Rapture plaque https://www.printables.com/model/186669-welcome-to-rapture-plaque
...