I have a data frame of online game matches including two specific columns: IDs of matches and IDs of players participated in a particular match. For instance:
match_id | player_id |
---|---|
0 | 1 |
0 | 2 |
0 | 3 |
0 | 4 |
0 | 5 |
1 | 6 |
1 | 1 |
1 | 7 |
1 | 8 |
1 | 2 |
Hence, player_id
is a unique identificator of a player. Meanwhile, match_id
is an ID of a match played, and it is duplicated fixed number of times (say, 5), since 5 is a maximum number of players that are able to participate in a certain match. So in each row, match_id
corresponds player_id
meaning that a certain player participated in a particular game.
As it can be seen from the table above, two or more players can play together more than one time (or they can have not any co-plays at all). And it's why I'm interested in transforming this initial data frame into a adjacency matrix, in which the intersection of row and a column would give the number of co-played matches. Another option would be to create a data frame like following:
player_1 | player_2 | coplays_number |
---|---|---|
1 | 2 | 2 |
1 | 3 | 1 |
1 | 4 | 1 |
1 | 10 | 0 |
1 | 5 | 1 |
... | ... | ... |
Hereby, my task is to prepare the data for a further analysis of a co-plays network using igraph
or networkx
. I also want to get a weighted network, that is a weight of an edge would mean a number of co-played matches between two nodes (players). Edge in this case means that two users have played together, i.e. they have participated in the same match once or they have played together as a team in two or more matches (like players' IDs 1 and 2 in the initial data example above).
My question is: how can I transform my initial data frame into network data, that igraph
or networkx
functions would take as an argument, using pandas
and numpy
? Or maybe I do not need any data manipulations and igraph
or networkx
functions are able to work with the initial data frame?
Thanks in advance for your answers and recommendations!
I think you don't need networkx
if you use permutations
from itertools
and pd.crosstab
:
from itertools import permutations
pairs = (df.groupby('match_id')['player_id']
.apply(lambda x: list(permutations(x, r=2)))
.explode())
adj = pd.crosstab(pairs.str[0], pairs.str[1],
rownames=['Player 1'], colnames=['Player 2'])
Output:
>>> adj
Player 2 1 2 3 4 5 6 7 8
Player 1
1 0 2 1 1 1 1 1 1
2 2 0 1 1 1 1 1 1
3 1 1 0 1 1 0 0 0
4 1 1 1 0 1 0 0 0
5 1 1 1 1 0 0 0 0
6 1 1 0 0 0 0 1 1
7 1 1 0 0 0 1 0 1
8 1 1 0 0 0 1 1 0
If you want a flat list (not an adjacency matrix), use combinations
:
from itertools import combinations
pairs = (df.groupby('match_id')['player_id']
.apply(lambda x: frozenset(combinations(x, r=2)))
.explode().value_counts())
coplays = pd.DataFrame({'Player 1': pairs.index.str[0],
'Player 2': pairs.index.str[1],
'coplays_number': pairs.tolist()})
Output:
>>> coplays
Player 1 Player 2 coplays_number
0 1 2 2
1 2 4 1
2 6 2 1
3 8 2 1
4 7 2 1
5 1 7 1
6 6 7 1
7 1 8 1
8 6 8 1
9 6 1 1
10 3 5 1
11 1 3 1
12 2 5 1
13 4 5 1
14 2 3 1
15 1 4 1
16 1 5 1
17 3 4 1
18 7 8 1