pythonpandasnetworkxigraphnetwork-analysis

Transform a dataframe for network analysis using pandas


I have a data frame of online game matches including two specific columns: IDs of matches and IDs of players participated in a particular match. For instance:

match_id player_id
0 1
0 2
0 3
0 4
0 5
1 6
1 1
1 7
1 8
1 2

Hence, player_id is a unique identificator of a player. Meanwhile, match_id is an ID of a match played, and it is duplicated fixed number of times (say, 5), since 5 is a maximum number of players that are able to participate in a certain match. So in each row, match_id corresponds player_id meaning that a certain player participated in a particular game.

As it can be seen from the table above, two or more players can play together more than one time (or they can have not any co-plays at all). And it's why I'm interested in transforming this initial data frame into a adjacency matrix, in which the intersection of row and a column would give the number of co-played matches. Another option would be to create a data frame like following:

player_1 player_2 coplays_number
1 2 2
1 3 1
1 4 1
1 10 0
1 5 1
... ... ...

Hereby, my task is to prepare the data for a further analysis of a co-plays network using igraph or networkx. I also want to get a weighted network, that is a weight of an edge would mean a number of co-played matches between two nodes (players). Edge in this case means that two users have played together, i.e. they have participated in the same match once or they have played together as a team in two or more matches (like players' IDs 1 and 2 in the initial data example above).

My question is: how can I transform my initial data frame into network data, that igraph or networkx functions would take as an argument, using pandas and numpy? Or maybe I do not need any data manipulations and igraph or networkx functions are able to work with the initial data frame?

Thanks in advance for your answers and recommendations!


Solution

  • I think you don't need networkx if you use permutations from itertools and pd.crosstab:

    from itertools import permutations
    
    pairs = (df.groupby('match_id')['player_id']
               .apply(lambda x: list(permutations(x, r=2)))
               .explode())
    adj = pd.crosstab(pairs.str[0], pairs.str[1],
                      rownames=['Player 1'], colnames=['Player 2'])
    

    Output:

    >>> adj
    Player 2  1  2  3  4  5  6  7  8
    Player 1                        
    1         0  2  1  1  1  1  1  1
    2         2  0  1  1  1  1  1  1
    3         1  1  0  1  1  0  0  0
    4         1  1  1  0  1  0  0  0
    5         1  1  1  1  0  0  0  0
    6         1  1  0  0  0  0  1  1
    7         1  1  0  0  0  1  0  1
    8         1  1  0  0  0  1  1  0
    

    If you want a flat list (not an adjacency matrix), use combinations:

    from itertools import combinations
    
    pairs = (df.groupby('match_id')['player_id']
               .apply(lambda x: frozenset(combinations(x, r=2)))
               .explode().value_counts())
    
    coplays = pd.DataFrame({'Player 1': pairs.index.str[0],
                            'Player 2': pairs.index.str[1],
                            'coplays_number': pairs.tolist()})
    

    Output:

    >>> coplays
        Player 1  Player 2  coplays_number
    0          1         2               2
    1          2         4               1
    2          6         2               1
    3          8         2               1
    4          7         2               1
    5          1         7               1
    6          6         7               1
    7          1         8               1
    8          6         8               1
    9          6         1               1
    10         3         5               1
    11         1         3               1
    12         2         5               1
    13         4         5               1
    14         2         3               1
    15         1         4               1
    16         1         5               1
    17         3         4               1
    18         7         8               1