I want to know the list of most used colors in this picture:
I tried the following code, but it takes too long:
from PIL import Image
colors = []
class Color:
def __init__(self, m, c):
self.col = c
self.many = m
im = Image.open("~/.../strowberry.jpeg")
def cool():
for i in im.getdata():
i = str(i)
i = i.replace(", ", "")
i = i.replace("(", "")
i = i.replace(")", "")
i = int(i)
colors.append(Color(1, i))
for x in colors:
num = 0
for j in range(len(colors)):
if x.col == colors[num].col:
del colors[num]
num -= 1
x.many += 1
num += 1
for obj in colors:
print(obj.many, obj.col)
cool()
Why is the code so slow and how can I improve the performance?
Do not reinvent the wheel. The Python Standard Library contains a Counter that can do this for you much more efficiently. Using this, you don't need to iterate over the data yourself. You also do not need to define a Class and perform the string operations. The code is very short and simple:
import collections
from PIL import Image
im = Image.open('strawberry.jpg')
counter = collections.Counter(im.getdata())
for color in counter:
print(f'{counter[color]} times color {color}')
If you really need the Color
objects (for whatever you want to do with it later in your program), you can easily create this from the counter object using this one-liner:
colors = [Color(counter[color], color) for color in counter]
...and if you really need it in the same string format as in your original code, use this instead:
colors = [Color(counter[color], int(''.join(map(str, color)))) for color in counter]
Note that the two one-liners make use of list comprehension, which is very Pythonic and in many cases very fast as well.
The code int(''.join(map(str, color)))
does the same as your 5 lines of code in the inner loop. This uses the fact that the original data is a tuple of integers, which can be converted to strings using map(str, ...)
and then concatenated together using ''.join(...)
.
All this together took about 0.5 second on my machine, without the printing (which is slow anyway).