python-3.xbeautifulsoupfile-processing

I am trying to delete lines of text in python that starts with /


I am trying to scrape a website and then save the links to a text file. in the text file, I would like to delete any line that does not start with "/". How could I do that? This is everything I have so far:

import requests
from bs4 import BeautifulSoup
page = requests.get("https://wiki.stardewvalley.net/Stardew_Valley_Wiki")
soup = BeautifulSoup(page.content, 'html.parser')

wikilinks = []
for con in soup.find_all('div', class_="mainmenuwrapper"):
    for links in soup.find_all('a', href=True):
        if links.text:
            wikilinks.append(links['href'])

# print(wikilinks)


with open('./scrapeNews/output.txt', 'w') as f:
    for item in wikilinks:
        f.write("%s\n" % item)

Solution

  • You can use the built-in startswith() method to check if a link startswith a "/". However, since there is also other information besides links, you can filter to only write links that start with "http", instead of just filtering for "/".

    ...
    with open("./scrapeNews/output.txt", "w") as f:
        for item in wikilinks:
            if not str(item).startswith("http"):
                continue
            f.write("%s\n" % item)