I have the following dataframe, mapping a one-to-many relationship between "courses" and "lessons":
course_id course_name lesson_id lesson_title
0 0 Learn C# 1 foo
1 0 Learn C# 2 bar
2 0 Learn C# 3 baz
3 1 Origami together 1 the crane
4 1 Origami together 2 crease patterns
5 2 WIP course 1 the first
How do I format it so that:
each lesson row is within the span of its belonging course row
lesson_id
and lesson_title
columns are under the span of a common lessons
column
as shown below:
lessons
course_id course_name id title
0 0 Learn C# 1 foo
1 2 bar
2 3 baz
3 1 Origami together 1 the crane
4 2 crease patterns
5 2 WIP course 1 the first
and producing an output similar to this when exported to Excel:
By looking at similar questions I found that accepted answers involve the use of multi-index, but in this case the first level of the index would have to comprehend all course related columns.
On top of that, the starting table is actually dinamically generated from corresponding Course
and Lesson
dataclasses, so I fear this approach wouldn't scale well if I were to add attributes to the Course
class.
Ideally I would index by course_id
and lesson_id
, then specify which columns are indexed by the former or the latter, thus avoiding course attributes being duplicated for each lesson;
Is there a way to achieve that?
If need MultiIndex in index and columns is possible use:
out = df.set_index(['course_id','course_name'])
out.columns = out.columns.str.split('_', expand=True)
If need row spans for both levels here is trick - helper column with empty strings:
out = df.assign(**{'':''}).set_index(['course_id','course_name', ''])
out.columns = out.columns.str.split('_', expand=True)
print (out)
lesson
id title
course_id course_name
0 Learn C 1 foo
2 bar
3 baz
1 Origami together 1 the crane
2 crease patterns
2 WIP course 1 the first
If need remove third column in Excel:
file = 'out.xlsx'
out.to_excel(file)
import xlwings as xw
wb = xw.Book(file)
wb.sheets['Sheet1'].range('C:C').delete()
wb.save(file)