I have code which finds the top 10 colors from an image, so for that I used PIL to find all of the colors and then processed it and found the 10 most common ones, but when I try it online, results are completely different. Does PIL process colors differently or what exactly what is going on here?
Here is my code:
img = Image.open(image_url)
all_colors = img.getcolors(maxcolors=100000) # List with every color in image
top_10_colors = all_colors[:10] # Set first 10 colors as starting list
for color in all_colors[9:]:
amount = color[0]
top_indexes = []
for top_color in top_10_colors:
top_index = top_color[0]
top_indexes.append(top_index)
if amount > min(top_indexes):
index = top_indexes.index(min(top_indexes))
top_10_colors[index] = color
sorted_top_colors = sorted(top_10_colors, key=lambda x: x[0], reverse=True)
print(sorted_top_colors)
print(top_10_colors),
The website result (in greatest to least order):
rgb(49,49,49) rgb(211,211,211) rgb(67,67,67) rgb(71,71,71) rgb(61,61,61) rgb(79,79,79) rgb(166,82,19) rgb(70,70,70) rgb(65,65,65) rgb(29,28,28)
My result (greatest to least):
[(35, 35, 35), (41, 41, 41), (36, 36, 36), (34, 34, 34), (44, 44, 44), (33, 33, 33), (31, 31, 31), (42, 42, 42), (50, 50, 50), (32, 32, 32)]
Here is the image:
Here is the link of the website I used which gave me the result: https://www.imgonline.com.ua/eng/get-dominant-colors.php
So, I already gave the correct answer in the comments, but speculating completely in the dark, without knowing what the website is doing.
Now, I can say, without the shadow of a doubt that it is, indeed, just doing some K-Means
Just a quick way (coding-wise; cpu-wise it is very slow, but still, for a one-shot things, faster than the time it would take for me to find a np.unique
or something way to do it) to find your result
from collections import Counter
import numpy as np
from PIL import Image
arr=np.asarray(Image.open('image.jpg'))
cnt=Counter(tuple(x) for x in arr.reshape(-1,3))
cnt.most_common(10) # ———> your results
Now a way to find the exact result of "online" method
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=10, random_state=0, n_init="auto")
kmeans.fit(arr.reshape(-1,3))
kmeans.cluster_centers_
Show the exact results of the "online" method (once rounded)
So short answer is "you are comparing two different things". Nothing says that those colors are the 10 most frequents. Just that they are the 10 you should keep if you wanted to keep only 10
One way (since we are in the context of image) to describe it, is that if you wanted a version of your image with only a palette of 10 colors, then those 10 colors are the one you want.
Kmeans clusterized all pixels values in 10 clusters. That is 10 groups of values that we could consider close enough (closer than they are from other clusters). The 10 values it output are the 10 center of those clusters. Note that we don't even have the guarantee that those values even exist (if you have a zillion (1,1,1) in a dataset, and a zillion (3,3,3), k-means may decide that the best way to summarize that is to put them in a single cluster whose center is (2,2,2) even tho there isn't any (2,2,2) in the dataset)
plt.imshow(kmeans.labels_.reshape(560,1000))
Shows you how each pixel is mapped into one of the 10 clusters
Or, using the centers as palette (not exactly palette: there is a cmap
thing in matplotlib. But it is faster here to rebuild a rgb image)
plt.imshow(kmeans.cluster_centers_[kmeans.labels_.reshape(arr.shape[:2])]/255.0)
By comparison, if I try the same thing from the top10
(your result of most frequent colors).
dists = ((arr[None,...] - top10[:,None,None,:])**2).sum(axis=3)
labels=np.argmin(dists, axis=0)
plt.imshow(top10[labels])
So, we plot here each pixel as one of the top10 pixel color: the one among the top10 that is closest to the original pixel.
As you can see, it is really not a good way to paint the image. Because the top10 colors are all rather darks colors.
So, again, those are very two different things. Your code answer the question "what are the top 10 most frequent colors", without any attempt to ensure that all top10 are not just variation of dark grey, and to ensure that white is in the top10: white color is not one of the 10 most frequent colors of your image, so your code doesn't show it, as it shouldn't.
Yet, if the question is "what are the 10 colors that are the most useful to describe the image", well, you need a white color, and even that beige for that paperclip holder even if it is a quite rare color in the image, we need to keep it, even if that means sacrificing one of the shade of grey, that are all more frequent that it, but on other hand are close enough to each other so that we can drop one.
It is just not the same question your website is answering.