I have a data frame from the LASTFM dataset with columns: user_id | friend_id like so:
uid | fid
346 | 23
355 | 48
and I would like to have a relation between users as a third columns (kind of an adjacency vector), such as:
uid1 | uid2 | friends
23 | 48 | 0
23 | 56 | 0
23 | .. | 0
23 | 346 | 1
48 | 23 | 0
48 | 56 | 0
48 | .. | 0
48 | 346 | 0
48 | 355 | 1
23 | .. | 0
23 | 346 | 1
346 | 23 | 1
I have tried working with merge, join, lambda but have so far been unsuccessful. Any help would be appreciated.
The strategy here is two step. First create the UID cross product dataset, then attach the friends indicator:
Create the UID cross product by first taking the union of pairs from original dataset, along with their inverses. We'll create an intermediate dataset, friends
, which we'll use later in the process to indicate which UIDs are friends:
pairs = df.rename(columns={'uid': 'uid1', 'fid': 'uid2'})
friends = pd.concat([pairs, pairs.rename(columns={'uid1': 'uid2', 'uid2':'uid1'})])
uids = friends.uid1.drop_duplicates().to_frame(name='uid')
uid
0 346
1 355
0 23
1 48
Then, append a dummy merge key to allow a cross product merge:
uids['dummy_key'] = 1
uids = uids.merge(uids, on='dummy_key', suffixes=('1', '2'))[['uid1', 'uid2']]
uid1 uid2
0 346 346
1 346 355
2 346 23
3 346 48
4 355 346
5 355 355
...
Now, we merge on the friends dataset, attaching an indicator column, to start our adjacency list:
adj = uids.merge(friends, on=['uid1', 'uid2'], how='left', indicator=True)
uid1 uid2 _merge
0 346 346 left_only
1 346 355 left_only
2 346 23 both
3 346 48 left_only
4 355 346 left_only
5 355 355 left_only
...
Finally, we code the _merge
indicator into the friend
column:
adj['friends'] = adj.apply(lambda row: 1 if row['_merge'] == 'both' else 0, axis=1)
adj = adj[['uid1', 'uid2', 'friends']]
uid1 uid2 friends
0 346 346 0
1 346 355 0
2 346 23 1
3 346 48 0
4 355 346 0
5 355 355 0