I have a data in CSV in below format:
"/some/page-1.md","title","My title 1"
"/some/page-1.md","description","My description 1"
"/some/page-1.md","type","Tutorial"
"/some/page-1.md","index","True"
"/some/page-2.md","title","My title 2"
"/some/page-2.md","description","My description 2"
"/some/page-2.md","type","Tutorial"
"/some/page-2.md","index","False"
"/some/page-2.md","custom_1","abc"
"/some/page-3.md","title","My title 3"
"/some/page-3.md","description","My description 3"
"/some/page-3.md","type","Tutorial"
"/some/page-3.md","index","True"
"/some/page-3.md","custom_2","def"
I am reading it to Pandas DataFrame:
df = pd.read_csv(csvFile, index_col=False, dtype=object, header=None)
print(df)
Output is following:
0 1 2
0 /some/page-1.md title My title 1
1 /some/page-1.md description My description 1
2 /some/page-1.md type Tutorial
3 /some/page-1.md index True
4 /some/page-2.md title My title 2
5 /some/page-2.md description My description 2
6 /some/page-2.md type Tutorial
7 /some/page-2.md index False
8 /some/page-2.md custom_1 abc
9 /some/page-3.md title My title 3
10 /some/page-3.md description My description 3
11 /some/page-3.md type Tutorial
12 /some/page-3.md index True
13 /some/page-3.md custom_2 def
I'd like to transform it to DataFrame in below format, where first header is "file" and values are from column 0. Other headers are taken from column 1 and values from column 2:
file title description type index custom_1 custom_2
0 /some/page-1.md My title 1 My description 1 Tutorial True NaN NaN
1 /some/page-2.md My title 2 My description 2 Tutorial False abc NaN
2 /some/page-3.md My title 3 My description 3 Tutorial True NaN def
Is there a way to do this with Pandas?
I have changed your first column names to file, header, and value. So, can easily handle what u want. You need to use pivot_table
method to reach your goal. The final code is shown in below.
df = pd.DataFrame(data, columns=["file", "header", "value"])
result = df.pivot_table(index='file', columns='header', values='value', aggfunc='first').reset_index()
result = result[result.index.notna()]
Your output will be like that. Soi we need to remove "header" label .
header file custom_1 custom_2 description index title type
0 /some/page-1.md NaN NaN My description 1 True My title 1 Tutorial
1 /some/page-2.md abc NaN My description 2 False My title 2 Tutorial
2 /some/page-3.md NaN def My description 3 True My title 3 Tutorial
For removing "header" label u need to use :
result.columns.name = None
Final output will be like
file custom_1 custom_2 description index title type
0 /some/page-1.md NaN NaN My description 1 True My title 1 Tutorial
1 /some/page-2.md abc NaN My description 2 False My title 2 Tutorial
2 /some/page-3.md NaN def My description 3 True My title 3 Tutorial