pythondockerselenium-webdriverflask

Problems with trying to download a webpage and click a button with selenium in docker using python


I cannot get this to work right for the life of me. I'm trying to load a web-page and click a button on it and I cant get it to work. Either Selenium complains, does not load, complains it cant make a session, complains that it does not have proper options, loads forever or just straight up does not work.

Dockerfile

FROM python:3.11-slim-buster

USER root

# Create a non-root user
RUN useradd -ms /bin/bash appuser
WORKDIR /app
RUN chown appuser:appuser /app

USER appuser

COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy   
COPY src .

# Expose the application port (e.g., 5000)
EXPOSE 5000

# Define the command to run the application
CMD ["python3", "app.py"]

Docker-compose.yml

version: '3.8'

services:
  chrome:
    image: selenium/node-chrome:3.14.0-gallium
    volumes:
      - /dev/shm:/dev/shm
    depends_on:
      - hub
    environment:
      HUB_HOST: hub
  hub:
    image: selenium/hub:3.14.0-gallium
    ports:
      - "4444:4444"

  web:
    build: .
    depends_on:
      - hub
    volumes:
      - ./src:/app
    ports:
      - "5000:5000"

app.py

from flask import Flask, render_template, request
import requests
import re
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
import urllib.parse
from selenium.webdriver.chrome.options import Options
def download_page(url):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.page_load_strategy = 'normal'
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--lang=en')
    chrome_options.add_argument('--ignore-certificate-errors')
    chrome_options.add_argument('--allow-running-insecure-content')
    chrome_options.add_argument('--disable-notifications')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--disable-browser-side-navigation')
    chrome_options.add_argument('--mute-audio')
    chrome_options.add_argument('--force-device-scale-factor=1')
    chrome_options.add_argument('window-size=1080x760')
    driver = webdriver.Remote('http://hub:4444/wd/hub')

    driver.get(url)
    //Process page or click buttons

app = Flask(__name__)

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/process', methods=['POST'])
def process():
    url = request.form['url'] 

    download_page(url)
    return "URL processing complete!"

if __name__ == '__main__':
    app.run(host='0.0.0.0',debug=True)

index.html

<!DOCTYPE html>
<html>
<head>
    <title>URL Processor</title>
</head>
<body>
    <h1>Enter a URL to process:</h1>
    <form method="POST" action="/process">
        <input type="text" name="url" placeholder="Enter URL here">
        <button type="submit">Process URL</button>
    </form>
</body>
</html>

I have tried using selenium/standalone-chrome as the docker base, but it does not allow pip to install flask because its "controlled externaly"

I have tried loading it external but it complains it cant make a session. SessionNotCreatedException

I tried loading it internally but it complains it cant find the chrome driver and when i tried installing it just hung. no error. nothing just sat there.

If i just run it as a standalone without flask it works PERFECTLY fine. Its just when I tried to wrap it into a docker file it stops me at every turn. It also does not help that the documentation for selenium is outdated.


Solution

  • You are creating the chrome options but you are not passing them to the WebDriver.

    When I add the options to it, it works fine for me.

    Change this line:

    driver = webdriver.Remote('http://hub:4444/wd/hub')
    

    to

    driver = webdriver.Remote('http://hub:4444/wd/hub', options=chrome_options)