pythoncomputer-visionobject-detectionpyautoguiimage-recognition

How to find image on screen if there are rapid objects moving over it?


I am currently discovering image recognition and my goal is to get all the informations from the board of a mobile game that look like this : board

As you can see, there are 5 different dices, their level can be between 1 and 7.

I know that AI recognition could solve this but I think it may be too much. Another idea i have is to make an average of lets say 20 frames so the projectiles will be erased (not really sure)

For example, projectiles can look like this : dice with projectiles

I want to get all the information from the board. It would be a 5*3 array with dice type and level for every position

I tried using pyautogui to look on the screen for each possible appearance of a dice but I encountered two problems:

I did all my test on the top-left position of the board with this:


def computeBoard():
  saw = False
  for dice in dicesList:
    try:
      if pyautogui.locateOnScreen(dice, region=(940, 580, 65, 65), confidence=0.90) is not None:
        print("I can see " + dice)
        saw = True
        break
    except pyautogui.ImageNotFoundException:
      pass
  if saw == False:
    print("I can't see any dice")

Lowering the tolerance helps the detection, but the program confounds different levels together.

(dicesList is simply a list of all images path)

Also I selected python because it's used on the tutorial I saw on youtube, but I can switch without problems if needed.

Are there solutions to my problem ?

Thanks in advance :)


Solution

  • Another idea i have is to make an average of lets say 20 frames so the projectiles will be erased ...

    Are there solutions to my problem?

    Yes, absolutely, you can easily "erase" the moving objects.

    But rather than Average, prefer Median. Averaging will leave behind faint ghost traces of projectiles. Median OTOH is a robust statistic that will completely discard such outliers.

    Suppose that if we're given N frames, a projectile will reliably enter then exit a given point. Then computing each point's median brightness value across slightly more than 2 × N frames will reliably remove any projectile artifacts. (Now, if another projectile follows very closely upon its heels, you will need additional input frames.)