pandasstring-concatenationpairing

How can I aggregate strings from many cells into one cell?


Say I have two classes with a handful of students each, and I want to think of the possible pairings in each class. In my original data, I have one line per student.

What's the easiest way in Pandas to turn this dataset

   Class Students
0      1  A
1      1  B
2      1  C
3      1  D
4      1  E
5      2  F
6      2  G
7      2  H

Into this new stuff?

   Class Students
0       1  A,B
1       1  A,C
2       1  A,D
3       1  A,E
4       1  B,C
5       1  B,D
6       1  B,E
7       1  C,D
6       1  B,E
8       1  C,D
9       1  C,E
10      1  D,E
11      2  F,G
12      2  F,H
12      2  G,H

Solution

  • Try This:

    import itertools
    import pandas as pd
    
    cla = [1, 1, 1, 1, 1, 2, 2, 2]
    s = ["A", "B", "C", "D" , "E", "F", "G", "H"]
    df = pd.DataFrame(cla, columns=["Class"])
    df['Student'] = s
    
    
    def create_combos(list_students):
        combos = itertools.combinations(list_students, 2)
        str_students = []
        for i in combos:
            str_students.append(str(i[0])+","+str(i[1]))
        return str_students
    
    def iterate_df(class_id):
        df_temp = df.loc[df['Class'] == class_id]
        list_student = list(df_temp['Student'])
        list_combos = create_combos(list_student)
        list_id = [class_id for i in list_combos]
        return list_id, list_combos
    
    list_classes = set(list(df['Class']))
    new_id = []
    new_combos = []
    for idx in list_classes:
        tmp_id, tmp_combo = iterate_df(idx)
        new_id += tmp_id
        new_combos += tmp_combo
    
    new_df = pd.DataFrame(new_id, columns=["Class"])
    new_df["Student"] = new_combos
    
    print(new_df)