htmlweb-scraping

Scraping a Website only for its Text


I am helping remodel a website and was wondering if it was possible to scrape just the Text out of the entire site. Doing a page one at a time using DATA SCRAPER is possible but there is hundreds of pages that need to be worked on. Is there a way to get them all in one scrape? Or further suggestions?


Solution

  • If I understand your question correctly, there is a standalone program called HTTrack (https://www.httrack.com/) that will download an entire website to your local computer. I've used it successfully in the past when I need to grab everything.

    Edit: This is my first answer on stackoverflow. Why was this voted down? I'd like to know so I don't do it again.