pythonword-cloud

Python Wordcloud: Help getting (near) what designers ask for


I am generating a wordcloud from a term-frequency dictionary and got this: Wordcloud of people mentioned on a given day

by using the following wordcloud parameters:

wordcloud = WordCloud(
        width=667,
        height=375,
        font_path="resources/NotoSerif-SemiBold.ttf",
        prefer_horizontal=1,
        max_words=20,
        background_color="whitesmoke",
        min_font_size=11,
        max_font_size=64,
    ).generate_from_frequencies(freqdict)

What I'm really not achieving are the Colour and Size scheme they request according to these specs: Color and size design guide can any of you come to at least some approximation of what they want? Thank you


Solution

  • Using color_func as mentioned in the comments, and borrowing from the more common online examples of how to work with color_func (Like this example or this stackoverflow question) you can get pretty close to the color scheme that has been asked for especially after switching this to your data and your font.

    from wordcloud import (WordCloud, get_single_color_func)
    import matplotlib.pyplot as plt
    
    
    class GroupedColorFunc(object):
        """Create a color function object which assigns DIFFERENT SHADES of
           specified colors to certain words based on the words font size using a font-size range specified in a dictionary.
    
           Uses wordcloud.get_single_color_func
    
           Parameters
           ----------
           color_to_font_size : dict(str -> list(str))
             A dictionary that maps a color to a list containing the min and max font size.
    
           default_color : str
             Color that will be assigned to a word that's not a member
             of any value from color_to_words.
        """
    
        def __init__(self, color_to_font_size , default_color):
            self.color_func_to_words = [
                (get_single_color_func(color), list(font_size_range))
                for (color, font_size_range) in color_to_font_size.items()]
        
            print([(color_func, font_size_range) for (color_func, font_size_range) in self.color_func_to_words])
            
            self.default_color_func = get_single_color_func(default_color)
    
        def get_color_func(self, word, font_size):
            
            """Returns a single_color_func associated with the word"""
            try:
                color_func = next(
                    color_func for (color_func, font_size_range) in self.color_func_to_words
                    if (font_size >= font_size_range[0] and font_size <= font_size_range[1]))
            except StopIteration:
                color_func = self.default_color_func
    
            return color_func
    
        def __call__(self, word, font_size, **kwargs):
            return self.get_color_func(word, font_size)(word, font_size, **kwargs)
    
    
    text = """The Zen of Python, by Tim Peters
    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!"""
    
    # Since the text is small collocations are turned off and text is lower-cased
    wc = WordCloud(
            width=667,
            height=375,
            prefer_horizontal=1,
            #max_words=20,
            #font_path="resources/NotoSerif-SemiBold.ttf"
            background_color="whitesmoke",
            min_font_size=11,
            max_font_size=64).generate(text.lower())
    
    color_to_font_size = {
        # color to font size ranges
        '#663871': [33,64],    
        '#333333': [15,32],
        '#6B6B6B': [0,14]
    }
    
    # Words that are not in any of the color_to_font_size values
    # will be colored with this default color function
    default_color = '#FFFFFF'
    
    # Create a color function with multiple tones
    grouped_color_func = GroupedColorFunc(color_to_font_size , default_color)
    
    # Apply our color function
    wc.recolor(color_func=grouped_color_func)
    
    # Plot
    plt.figure()
    plt.imshow(wc, interpolation="bilinear")
    plt.axis("off")
    plt.show()
    

    enter image description here

    The big change from the two examples I linked to and this solution is that we pull in the font_size parameter through color_func. As luck would have it this is one of the handful of parameters that are fed through this function. From the documentation:

    Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word.


    Edit: I had some time to play around more with wordcloud and parameters. Simplifying the color_func to just a function so it's easier to understand without having to jump into class objects, and monkeying with scaling and whatnot, the following may be a bit more in the ballpark for your designer:

    from wordcloud import (WordCloud, get_single_color_func)
    import matplotlib.pyplot as plt
    
    color_to_font_size = {
            # color to font size ranges
            '#663871': [33,65],    
            '#333333': [15,33],
            '#6B6B6B': [11,15]
        }
    
    def simple_color_func(word, font_size, position, orientation, font_path, random_state):
        out_color = '#000000'
        for color, font_size_range in color_to_font_size.items():
            if font_size in range(*font_size_range):
                out_color = color 
        return out_color
    
    
    
    text = """The Zen of Python, by Tim Peters
    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!"""
    
    # Since the text is small collocations are turned off and text is lower-cased
    wc = WordCloud(
            color_func=simple_color_func,
            width=350,
            height=175,
            prefer_horizontal=1,
            #max_words=20,
            font_path="resources/NotoSerif-SemiBold.ttf",
            background_color="whitesmoke",
            min_font_size=11,
            max_font_size=64, 
            relative_scaling=0.75,
            scale=5
            ).generate(text.lower())
    
    # Plot
    plt.figure()
    plt.imshow(wc, interpolation="bilinear")
    plt.axis("off")
    plt.show()
    

    enter image description here