pythonpandasdataframecolumnsortingcol

How to split a column into several columns by taking the string values as column headers?


This is my dataset:

| Name     | Dept     | Project area/areas interested     | 
| -------- | -------- |-----------------------------------|
| Joe      | Biotech  | Cell culture//Bioinfo//Immunology |
| Ann      | Biotech  | Cell culture                      |
| Ben      | Math     | Trigonometry//Algebra             |
| Keren    | Biotech  | Microbio                          |
| Alice    | Physics  | Optics                            |

This is how I want my result:

| Name     | Dept     |Cell culture|Bioinfo|Immunology|Trigonometry|Algebra|Microbio|Optics|
| -------- | -------- |------------|-------|----------|------------|-------|--------|------|
| Joe      | Biotech  |     1      |   1   |    1     |      0     |   0   |   0    | 0    |
| Ann      | Biotech  |     1      |   0   |    1     |      0     |   0   |   0    | 0    |   
| Ben      | Math     |     0      |   0   |    0     |      1     |   1   |   0    | 0    |
| Keren    | Biotech  |     0      |   0   |    0     |      0     |   0   |   1    | 0    |
| Alice    | Physics  |     0      |   0   |    0     |      0     |   0   |   0    | 1    |

Not only do I have to split the last column into the different columns based on the rows - I have to resplit certain column values that are seperated by "//". And the values in the dataframe have to be replaced with 1 or 0 (int). I've been stuck on this for a while now (-_-;)


Solution

  • You can use pandas.concat in combination with pandas.get_dummies like this:

    pd.concat([df[["Name", "Dept"]], df["Project area/areas interested"].str.get_dummies(sep='//')], axis=1)