linuxwgettxtlftparia2

Get list of files from https and save as txt file


I want to get list of directories from https for aria2c.

Since, as I know, unlikely to wget, there is no recurrent option in aria2c, I going to use the txt file as mentioned here

So I need the list of directories.

This is the target https.

I tried lftp but there were some cerificate errors.

It would be greatful to let me know how to get the txt file.
Thank you!


Solution

  • Try this hacked together script.

    function list_folder() {
        echo "Starting new run! $1"
        content=$(curl -s -L 'https://physionet.org/files/mimic3wdb-matched/1.0/'"$1")
        folders=$(echo "$content" | grep -o -P '(?<=">).*(?=/</a>)' | grep -v '\.\.')
        # files are all the entries that don't end with a `/`
        files=$(echo "$content" | grep -o -P '(?<=">).*[^/](?=<\/a>)')
        echo "FOLDERS: $folders"
        echo "FILES: $files"
        for folder in $folders; do
            list_folder "$1/$folder"
        done
    }
    
    list_folder
    

    It'll recursively search all the files in the directory listing and print them. If you want to save the files into a file, just redirect $files into the file.

    You can also try making it multi threaded by appending a & to the list_folder calls.