I have a script to find a colored box based off a screenshot of a webpage. I want to click the box I found but the pyautogui can't take the points from the browser and translate it directly into the right clickable point.
Is there a way to just tell pyautogui to find the coordinates inside the browser window instead of having to convert them?
the clicking part of my script for reference:
def click_correct_box(driver):
time.sleep(4)
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".block2"))
)
element.screenshot('elem.png')
element_image = Image.open('elem.png')
ocr_data = pytes`your text`seract.image_to_data(element_image, output_type=pytesseract.Output.DICT)
text_bounding_box = find_text_bounding_box(ocr_data)
reference_color = find_reference_color(element_image, text_bounding_box)
visualize_ocr_data(element_image, ocr_data)
if reference_color is None:
print("Failed to analyze reference color")
return
time.sleep(2)
driver.save_screenshot('element_with_boxes.png')
element_with_boxes_image = Image.open('element_with_boxes.png')
closest_box = find_color_boxes(element_with_boxes_image, reference_color, text_bounding_box)
if closest_box:
x1, y1, x2, y2 = closest_box
click_x = (x1 + x2) / 2
click_y = (y1 + y2) / 2
print(f"Click coordinates: ({click_x}, {click_y})")
driver.execute_script("window.scrollTo(0, arguments[0]);", y1 - 50)
time.sleep(1)
location = element.location
size = element.size
# Adjust click_x and click_y to absolute coordinates relative to the browser window
click_x_absolute = location['x'] + click_x
click_y_absolute = location['y'] + click_y
print(f"Absolute click coordinates: ({click_x_absolute}, {click_y_absolute})")
# Directly use the absolute coordinates relative to the browser window
pyautogui.moveTo(click_x_absolute, click_y_absolute, duration=1)
pyautogui.click()
time.sleep(2)
else:
print("No suitable box found")
I've tried to use another click method with js directly but that didnt work. Several hours of debugging and fine-tuning and I'm still here
From what I can see, you're locating the element within a screenshot of the page. This means scrolling should never be required because the screenshot only includes what is currently visible. You haven't provided enough info so that someone can figure out what's wrong but my guess would be it has to do with the coordinate math. Try this:
driver.execute_script(f"document.elementFromPoint({click_x}, {click_y}).click();")
Edit: Here's a way to visualize where you're clicking using this method by adding a red square at the coordinates:
driver.execute_script(f"""d = document.createElement("div");
d.style.position="absolute";
d.style.top="{click_x}px";
d.style.left="{click_y}px";
d.style.height="5px";
d.style.width="5px";
d.style.backgroundColor="red";
d.style.zIndex="9999";
document.getElementsByTagName("body")[0].appendChild(d);""")