I am trying to scrape a website and then save the links to a text file. in the text file, I would like to delete any line that does not start with "/". How could I do that? This is everything I have so far:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://wiki.stardewvalley.net/Stardew_Valley_Wiki")
soup = BeautifulSoup(page.content, 'html.parser')
wikilinks = []
for con in soup.find_all('div', class_="mainmenuwrapper"):
for links in soup.find_all('a', href=True):
if links.text:
wikilinks.append(links['href'])
# print(wikilinks)
with open('./scrapeNews/output.txt', 'w') as f:
for item in wikilinks:
f.write("%s\n" % item)
You can use the built-in startswith()
method to check if a link startswith a "/". However, since there is also other information besides links, you can filter to only write links that start with "http", instead of just filtering for "/".
...
with open("./scrapeNews/output.txt", "w") as f:
for item in wikilinks:
if not str(item).startswith("http"):
continue
f.write("%s\n" % item)