I want to get list of directories from https for aria2c
.
Since, as I know, unlikely to wget
, there is no recurrent option in aria2c
, I going to use the txt file as mentioned here
So I need the list of directories.
This is the target https.
I tried lftp
but there were some cerificate errors.
It would be greatful to let me know how to get the txt file.
Thank you!
Try this hacked together script.
function list_folder() {
echo "Starting new run! $1"
content=$(curl -s -L 'https://physionet.org/files/mimic3wdb-matched/1.0/'"$1")
folders=$(echo "$content" | grep -o -P '(?<=">).*(?=/</a>)' | grep -v '\.\.')
# files are all the entries that don't end with a `/`
files=$(echo "$content" | grep -o -P '(?<=">).*[^/](?=<\/a>)')
echo "FOLDERS: $folders"
echo "FILES: $files"
for folder in $folders; do
list_folder "$1/$folder"
done
}
list_folder
It'll recursively search all the files in the directory listing and print them. If you want to save the files into a file, just redirect $files
into the file.
You can also try making it multi threaded by appending a &
to the list_folder
calls.