pythonocrpyautoguiwebautomation

Find coordinates of point in browser for clicking with pyautogui


I have a script to find a colored box based off a screenshot of a webpage. I want to click the box I found but the pyautogui can't take the points from the browser and translate it directly into the right clickable point.

Is there a way to just tell pyautogui to find the coordinates inside the browser window instead of having to convert them?

the clicking part of my script for reference:

def click_correct_box(driver):
    time.sleep(4)
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, ".block2"))
    )
    element.screenshot('elem.png')
    element_image = Image.open('elem.png')
    ocr_data = pytes`your text`seract.image_to_data(element_image, output_type=pytesseract.Output.DICT)
    text_bounding_box = find_text_bounding_box(ocr_data)
    reference_color = find_reference_color(element_image, text_bounding_box)
    visualize_ocr_data(element_image, ocr_data)
    if reference_color is None:
        print("Failed to analyze reference color")
        return
    time.sleep(2)
    driver.save_screenshot('element_with_boxes.png')
    element_with_boxes_image = Image.open('element_with_boxes.png')
    closest_box = find_color_boxes(element_with_boxes_image, reference_color, text_bounding_box)
    if closest_box:
        x1, y1, x2, y2 = closest_box
        click_x = (x1 + x2) / 2
        click_y = (y1 + y2) / 2
        print(f"Click coordinates: ({click_x}, {click_y})")

        driver.execute_script("window.scrollTo(0, arguments[0]);", y1 - 50)
        time.sleep(1)

        location = element.location
        size = element.size

        # Adjust click_x and click_y to absolute coordinates relative to the browser window
        click_x_absolute = location['x'] + click_x
        click_y_absolute = location['y'] + click_y

        print(f"Absolute click coordinates: ({click_x_absolute}, {click_y_absolute})")

        # Directly use the absolute coordinates relative to the browser window
        pyautogui.moveTo(click_x_absolute, click_y_absolute, duration=1)
        pyautogui.click()
        time.sleep(2)
    else:
        print("No suitable box found")

I've tried to use another click method with js directly but that didnt work. Several hours of debugging and fine-tuning and I'm still here


Solution

  • From what I can see, you're locating the element within a screenshot of the page. This means scrolling should never be required because the screenshot only includes what is currently visible. You haven't provided enough info so that someone can figure out what's wrong but my guess would be it has to do with the coordinate math. Try this:

    driver.execute_script(f"document.elementFromPoint({click_x}, {click_y}).click();")
    

    Edit: Here's a way to visualize where you're clicking using this method by adding a red square at the coordinates:

    driver.execute_script(f"""d = document.createElement("div");
                          d.style.position="absolute"; 
                          d.style.top="{click_x}px";
                          d.style.left="{click_y}px";
                          d.style.height="5px";
                          d.style.width="5px";
                          d.style.backgroundColor="red";
                          d.style.zIndex="9999";
                          document.getElementsByTagName("body")[0].appendChild(d);""")