google-drive-apigoogle-colaboratorywgetyandex

How to download file from yandex drive to google drive directy using google colab?


Let's suppose that I have any archive or file on yandex drive like that: https://disk.yandex.ru/d/KGA6qXDT87pTVA I want to download it directly to my google drive, how could I do it? First idea that comes to my mind is to:

  1. Mount drive
from google.colab import drive

drive.mount('/content/drive/path/to/my/dir')
  1. Use jupyter comand line commands
! wget https://disk.yandex.ru/d/KGA6qXDT87pTVA

But it downloads html page, not the content. Hence, to use wget I have to have direct link to the file. I can't find such direct link on page, so what should I do?


Solution

  • Another solution that I came up with is to parse HTML code of webpage and pull out download buttons using BeautifulSoup. But it seems like to protect websites from potential attacks many developers set CAPTCHA protection. Hence the only proper way to do it without violating any rules is to use API requests.

    The official YANDEX webpage describes in deets how to do it.
    Check https://yandex.com/dev/disk/api/concepts/about.html

    Unfortunately, it doesn't help in all cases because not all disk and sharing platforms provide you with API and at the same time protect their page with CAPTCHA. In these type of situations you should do everything manually without any automation.