I recently make a request against the Google Cloud Service API endpoint and wget a lot of files into one single folder. Owing to the fact that all sub-directories separator 0/
are being replaced by %2F
with the addition of ?alt=media
, all the downloaded files are contaminated with these strings. e.g.
hg38%2Fv0%2FHomo_sapiens_assembly38.dict?alt=media
hg19%2Fv0%2FHomo_sapiens_assembly19.fasta.alt?alt=media
I tried to test the following in bash and it returned the result i wanted:
echo "$hg19%2Fv0%2FHomo_sapiens_assembly19.fasta.alt?alt=media" | sed -e "s/^$hg19%2Fv0%2F//" -e "s/\?.*//g"
i.e. Homo_sapiens_assembly19.fasta.alt. Unfortunately when I scaled it up using,
for file in *; do
mv "$file" '$(echo "$file" | sed -e "s/^$hg19%2Fv0%2F//" -e "s/\?.*//g")' ;
done
all the files turned into 1 file named "$file". I couldnt figure out why.
Please can anyone provide a solution to my problem? And if some of the files contain different repeats of "%2F", how can I elegantly only keep the string after the last "%2F" and string the "?alt=media" from the end in the same line?
Thank you in advance.
Use .*
to match everything up to the last %2F
.
Put the command substitution inside double quotes, not single quotes. See Difference between single and double quotes in Bash
Don't put $
before hg
at the beginning.
It's not a requirement, but sed
commands are usually put in single quotes, unless you're using variables in the substitution.
for file in *; do
mv "$file" "$(echo "$file" | sed -e 's/^hg.*%2F//' -e 's/\?.*//g')" ;
done