I am generating a wordcloud from a term-frequency dictionary and got this:
by using the following wordcloud parameters:
wordcloud = WordCloud(
width=667,
height=375,
font_path="resources/NotoSerif-SemiBold.ttf",
prefer_horizontal=1,
max_words=20,
background_color="whitesmoke",
min_font_size=11,
max_font_size=64,
).generate_from_frequencies(freqdict)
What I'm really not achieving are the Colour and Size scheme they request according to these specs: can any of you come to at least some approximation of what they want? Thank you
Using color_func
as mentioned in the comments, and borrowing from the more common online examples of how to work with color_func
(Like this example or this stackoverflow question) you can get pretty close to the color scheme that has been asked for especially after switching this to your data and your font.
from wordcloud import (WordCloud, get_single_color_func)
import matplotlib.pyplot as plt
class GroupedColorFunc(object):
"""Create a color function object which assigns DIFFERENT SHADES of
specified colors to certain words based on the words font size using a font-size range specified in a dictionary.
Uses wordcloud.get_single_color_func
Parameters
----------
color_to_font_size : dict(str -> list(str))
A dictionary that maps a color to a list containing the min and max font size.
default_color : str
Color that will be assigned to a word that's not a member
of any value from color_to_words.
"""
def __init__(self, color_to_font_size , default_color):
self.color_func_to_words = [
(get_single_color_func(color), list(font_size_range))
for (color, font_size_range) in color_to_font_size.items()]
print([(color_func, font_size_range) for (color_func, font_size_range) in self.color_func_to_words])
self.default_color_func = get_single_color_func(default_color)
def get_color_func(self, word, font_size):
"""Returns a single_color_func associated with the word"""
try:
color_func = next(
color_func for (color_func, font_size_range) in self.color_func_to_words
if (font_size >= font_size_range[0] and font_size <= font_size_range[1]))
except StopIteration:
color_func = self.default_color_func
return color_func
def __call__(self, word, font_size, **kwargs):
return self.get_color_func(word, font_size)(word, font_size, **kwargs)
text = """The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!"""
# Since the text is small collocations are turned off and text is lower-cased
wc = WordCloud(
width=667,
height=375,
prefer_horizontal=1,
#max_words=20,
#font_path="resources/NotoSerif-SemiBold.ttf"
background_color="whitesmoke",
min_font_size=11,
max_font_size=64).generate(text.lower())
color_to_font_size = {
# color to font size ranges
'#663871': [33,64],
'#333333': [15,32],
'#6B6B6B': [0,14]
}
# Words that are not in any of the color_to_font_size values
# will be colored with this default color function
default_color = '#FFFFFF'
# Create a color function with multiple tones
grouped_color_func = GroupedColorFunc(color_to_font_size , default_color)
# Apply our color function
wc.recolor(color_func=grouped_color_func)
# Plot
plt.figure()
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()
The big change from the two examples I linked to and this solution is that we pull in the font_size
parameter through color_func
. As luck would have it this is one of the handful of parameters that are fed through this function. From the documentation:
Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word.
Edit: I had some time to play around more with wordcloud and parameters. Simplifying the color_func
to just a function so it's easier to understand without having to jump into class objects, and monkeying with scaling and whatnot, the following may be a bit more in the ballpark for your designer:
from wordcloud import (WordCloud, get_single_color_func)
import matplotlib.pyplot as plt
color_to_font_size = {
# color to font size ranges
'#663871': [33,65],
'#333333': [15,33],
'#6B6B6B': [11,15]
}
def simple_color_func(word, font_size, position, orientation, font_path, random_state):
out_color = '#000000'
for color, font_size_range in color_to_font_size.items():
if font_size in range(*font_size_range):
out_color = color
return out_color
text = """The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!"""
# Since the text is small collocations are turned off and text is lower-cased
wc = WordCloud(
color_func=simple_color_func,
width=350,
height=175,
prefer_horizontal=1,
#max_words=20,
font_path="resources/NotoSerif-SemiBold.ttf",
background_color="whitesmoke",
min_font_size=11,
max_font_size=64,
relative_scaling=0.75,
scale=5
).generate(text.lower())
# Plot
plt.figure()
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()