pythonarraysdictionarythemoviedb-apiedge-list

create an edge list on films that share a genre


hello everyone I'm doing a project to analyze a website and build a network graph with python. I chose the themovieb.org website. The nodes are the ids of the movies and the links between the nodes are the genres that two movies depend on. For example node_A and node_B have a link if they have the same genres in common. I extracted the nodes and put them in an array: nodes. I have for example:

[
{'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1': '16', 'genre_ids_2': '28'},
{'id': 536554, 'label': 'M3GAN', 'genre_ids_1': '878', 'genre_ids_2': '27'},
{'id': 76600, 'label': 'Avatar: The Way of Water', 'genre_ids_1': '878', 'genre_ids_2': '12'},
{'id': 653851, 'label': 'Devotion', 'genre_ids_1': '10752', 'genre_ids_2': '878'},
{'id': 846433, 'label': 'The Enforcer', 'genre_ids_1': '28', 'genre_ids_2': '53'}
]

so I want to make a link for example between the movie "Puss in Boots: The Last Wish" and the movie "The Enforcer" which share the genre 28. I want as a result the edge list:

source      target               genre_ids
315162      846433               28
846433      315162               28
76600       536554               878
76600       653851               878
536554      76600                878
so on...

this is my code:

genres=[28,12,16,35,80,99,18,10751,14,36,27,10402,9648,10749,878,10770,53,10752,37]
edges=[]
nodes = [{'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1':'16','genre_ids_2': '28'},{'id': 536554, 'label': 'M3GAN','genre_ids_1':'878','genre_ids_2': '27'},{'id': 76600, 'label': 'Avatar: The Way of Water','genre_ids_1':'878', 'genre_ids_2': '12'},{'id': 653851, 'label': 'Devotion','genre_ids_1': '10752', 'genre_ids_2': '878'},{'id': 846433, 'label': 'The Enforcer','genre_ids_1': '28', 'genre_ids_2': '53'}]
dictionary={}
def get_edges():
    for i in nodes:
        if i["genre_ids_1"] in genres:
                dictionary.setdefault(i['genre_ids_1'], []).append(i['label'])
        elif i["genre_ids_2"] in genres:
                dictionary.setdefault(i['genre_ids_2'], []).append(i['label'])
        if i["genre_ids_1"] in dictionary:
                if i["label"]  not in dictionary[ i["genre_ids_1"]][0]:
                    edges.append({"source":i["label"],"target":i["id"],"genre_id":dictionary[ i["genre_ids_1"]][0] })
        elif i["genre_ids_2"] in dictionary:
                if i["label"]  not in dictionary[ i["genre_ids_2"]][1]:
                    edges.append({"source":i["label"],"target":i["id"],"genre_id":dictionary[ i["genre_ids_2"]][1] })
    print(edges)
get_edges()     

How can i do?


Solution

  • First construct a dict nodes_by_genre that maps each genre id to the associated nodes (dicts). Then use itertools.permutations to generate the directed edges associated with each genre. Finally format each directed edge into a tuple for subsequent usage.

    Note: If you want undirected edges, use itertools.combinations instead.

    from pprint import pprint
    from itertools import permutations
    
    nodes = [
        {'id': 315162, 'label': 'Puss in Boots: The Last Wish', 'genre_ids_1': '16', 'genre_ids_2': '28'}, 
        {'id': 536554, 'label': 'M3GAN', 'genre_ids_1': '878', 'genre_ids_2': '27'}, 
        {'id': 76600, 'label': 'Avatar: The Way of Water', 'genre_ids_1': '878', 'genre_ids_2': '12'}, 
        {'id': 653851, 'label': 'Devotion', 'genre_ids_1': '10752', 'genre_ids_2': '878'}, 
        {'id': 846433, 'label': 'The Enforcer', 'genre_ids_1': '28', 'genre_ids_2': '53'},
    ]
    
    def get_edges(nodes):
        nodes_by_genre = {}
        for node in nodes:
            nodes_by_genre.setdefault(node['genre_ids_1'], []).append(node)
            nodes_by_genre.setdefault(node['genre_ids_2'], []).append(node)
        edges = []
        for genre, nodes in nodes_by_genre.items():
            node_pairs = permutations(nodes, 2)
            new_edges = ((node1['label'], node2['label'], genre) for node1, node2 in node_pairs)
            edges.extend(new_edges)
        return edges
        
    edges = get_edges(nodes)
    pprint(edges)
    

    Output:

    [('Puss in Boots: The Last Wish', 'The Enforcer', '28'),
     ('The Enforcer', 'Puss in Boots: The Last Wish', '28'),
     ('M3GAN', 'Avatar: The Way of Water', '878'),
     ('M3GAN', 'Devotion', '878'),
     ('Avatar: The Way of Water', 'M3GAN', '878'),
     ('Avatar: The Way of Water', 'Devotion', '878'),
     ('Devotion', 'M3GAN', '878'),
     ('Devotion', 'Avatar: The Way of Water', '878')]