We have external website and there we have link to download a file. I want to download this file to s3 bucket. This website also asks to login before it gets downloaded.
How do we implement it using AWS services. I have no idea how we can connect to external website and pass username and password to login a website and click on download link to download the file to s3 bucket programmatically.
Can anyone give me some idea on how we can implement this scenario? and also how AWS connects to external website and what security precautions we need to take while connecting to the external websites.
I really appreciates your help and don't mistaken me for not providing any code snippet.
Really looking for some insights to understand this process. Also if you can refer any documentation that really helps..
Thanks, Bab.
For that you can use Selenium (a Python library for web interactions with a browser) to handle login and downloading, and Boto3 (AWS SDK) to upload the file to S3.
You can simply do that using Lambda, or if you have some hard stuff, you can use AWS Batch. then you just need that the Lambda or AWS Batch has access to the internet, with permission to the S3 bucket, in order to be able to upload the file there.
this is a small code snippet on how to do it, as an example:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import boto3
def download_and_upload_to_s3():
# Setup Selenium WebDriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage") # Required for Lambda
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
try:
# Log in to the website
website_url = "https://example.com/login"
username = "your_username"
password = "your_password"
driver.get(website_url)
# Fill in login form
driver.find_element(By.ID, "username_field_id").send_keys(username)
driver.find_element(By.ID, "password_field_id").send_keys(password)
driver.find_element(By.ID, "login_button_id").click()
driver.get("https://example.com/download/file")
file_path = "/tmp/downloaded_file" # Temporary location in Lambda or Batch
with open(file_path, "wb") as f:
f.write(driver.page_source.encode("utf-8")) # Simplified for example; adjust for actual file download.
s3_client = boto3.client('s3')
bucket_name = "your-bucket-name"
s3_key = "path/to/your/file"
s3_client.upload_file(file_path, bucket_name, s3_key)
finally:
driver.quit()
# Run the function
download_and_upload_to_s3()
You will not connect the AWS Account to the website, but you would rather create a script which will download the file and upload it to your AWS bucket. you can even run it locally or outside AWS, you just need to have credentials to the AWS account, which has the needed bucket.