pythonpandasgrouping

Group a list by a field, which in turn is a list


I have a class for movies:

class Movie:
    def __init__(self,
                 title: str,
                 director: str,
                 actors: list[str]):
        self.title = title
        self.director = director
        self.actors: list[str] = actors

And a list with 3 example movies:

movies = [Movie('Barton Fink', 'Joel Coen', ['John Turturro', 'John Goodman', 'Judy Davis']),
          Movie('The Big Lebowski', 'Joel Coen', ['Jeff Bridges', 'John Goodman', 'Steve Buscemi', 'John Turturro']),
          Movie('The Big Easy', 'Jim McBride', ['Dennis Quaid', 'Ellen Barkin', 'John Goodman']),
         ]

I use Pandas to get the number of occurrences of all actors:

John Goodman: 3
John Turturro: 2
Judy Davis: 1
...

For the directors it works this way:

df = DataFrame([vars(m) for m in movies])
grouped = df.groupby(['director']).size().sort_values(ascending=False)
print(df)

But for the actors not:

df = DataFrame([vars(m) for m in movies])
grouped = df.groupby(['actors']).size().sort_values(ascending=False)
print(df)

Error: (<class 'TypeError'>, TypeError("unhashable type: 'list'"), <traceback object at 0x00000274C91E4340>)

How can I group by actors?


Solution

  • You don't need pandas for this. You can use collections.Counter.

    from collections import Counter
    
    Counter(actor for movie in movies for actor in movie.actors)