wget

Creating directories with wget


I need to download files from several pages, using wget -r -l 1 -nd -H --accept-regex 'https://blogspot.com/s[0-9]{4}/[0-9]{3}.pdf' -i list.txt; in the TXT file I have a list of all the pages from which I need to download, one per line, like

https://blogspot.com/test/001

https://blogspot.com/test/002

and so on.

I'm trying to create different folders for each source, so that all the files downloaded from https://blogspot.com/test/001 are in a folder named 001, all those from https://blogspot.com/test/002 are in a folder 002, and so on.

How could I do that?


Solution

  • You might use -P to instruct GNU wget to store download e.g.

    wget -P examplepage -np -r -l 1 http://www.example.com
    

    will store what it download inside examplepage directory. Said directory will be created if it does not exists yet.

    I'm trying to create different folders for each source, so that all the files downloaded from https://blogspot.com/test/001 are in a folder named 001, all those from https://blogspot.com/test/002 are in a folder 002, and so on.

    I do not know if it possible with single wget call. You might use loop to process file line by line, for example let say urls.txt content is

    http://www.example.com?page=001
    http://www.example.com?page=002
    http://www.example.com?page=003
    

    and I wish 1st to into directory named 001, 2nd into directory named 002, 3rd into directory named 003 I could do that by

    while read line; do
        dirname=$(echo "$line" | sed 's/.*page=//')
        wget -P "$dirname" "$line"
    done < urls.txt
    

    Explanation: I use while loop to process file named urls.txt line by line, I use GNU sed to prepare directory name by removing everything up to page= from url.