python colors python-imaging-library counting detect

Checking most used colors in image

I want to know the list of most used colors in this picture:

I tried the following code, but it takes too long:

from PIL import Image

colors = []
class Color:
    def __init__(self, m, c):
        self.col = c
        self.many = m

im = Image.open("~/.../strowberry.jpeg")
def cool():
    for i in im.getdata():
        i = str(i)
        i = i.replace(", ", "")
        i = i.replace("(", "")
        i = i.replace(")", "")
        i = int(i)
        colors.append(Color(1, i))
    for x in colors:
        num = 0
        for j in range(len(colors)):
            if x.col == colors[num].col:
                del colors[num]
                num -= 1
                x.many += 1
            num += 1
    for obj in colors:
        print(obj.many, obj.col)
cool()

Why is the code so slow and how can I improve the performance?

Solution

Do not reinvent the wheel. The Python Standard Library contains a Counter that can do this for you much more efficiently. Using this, you don't need to iterate over the data yourself. You also do not need to define a Class and perform the string operations. The code is very short and simple:

import collections
from PIL import Image

im = Image.open('strawberry.jpg')
counter = collections.Counter(im.getdata())
for color in counter:
    print(f'{counter[color]} times color {color}')

If you really need the Color objects (for whatever you want to do with it later in your program), you can easily create this from the counter object using this one-liner:

colors = [Color(counter[color], color) for color in counter]

...and if you really need it in the same string format as in your original code, use this instead:

colors = [Color(counter[color], int(''.join(map(str, color)))) for color in counter]

Note that the two one-liners make use of list comprehension, which is very Pythonic and in many cases very fast as well. The code int(''.join(map(str, color))) does the same as your 5 lines of code in the inner loop. This uses the fact that the original data is a tuple of integers, which can be converted to strings using map(str, ...) and then concatenated together using ''.join(...).

All this together took about 0.5 second on my machine, without the printing (which is slow anyway).