pythonpandasdataframecsvread-csv

Pandas read csv where with id in column 1, headers in col 2 and values in col 3?


I have a data in CSV in below format:

"/some/page-1.md","title","My title 1"
"/some/page-1.md","description","My description 1"
"/some/page-1.md","type","Tutorial"
"/some/page-1.md","index","True"
"/some/page-2.md","title","My title 2"
"/some/page-2.md","description","My description 2"
"/some/page-2.md","type","Tutorial"
"/some/page-2.md","index","False"
"/some/page-2.md","custom_1","abc"
"/some/page-3.md","title","My title 3"
"/some/page-3.md","description","My description 3"
"/some/page-3.md","type","Tutorial"
"/some/page-3.md","index","True"
"/some/page-3.md","custom_2","def"

I am reading it to Pandas DataFrame:

df = pd.read_csv(csvFile, index_col=False, dtype=object, header=None)
print(df)

Output is following:

                  0            1                 2
0   /some/page-1.md        title        My title 1
1   /some/page-1.md  description  My description 1
2   /some/page-1.md         type          Tutorial
3   /some/page-1.md        index              True
4   /some/page-2.md        title        My title 2
5   /some/page-2.md  description  My description 2
6   /some/page-2.md         type          Tutorial
7   /some/page-2.md        index             False
8   /some/page-2.md     custom_1               abc
9   /some/page-3.md        title        My title 3
10  /some/page-3.md  description  My description 3
11  /some/page-3.md         type          Tutorial
12  /some/page-3.md        index              True
13  /some/page-3.md     custom_2               def

I'd like to transform it to DataFrame in below format, where first header is "file" and values are from column 0. Other headers are taken from column 1 and values from column 2:

              file       title       description      type  index  custom_1  custom_2
0  /some/page-1.md  My title 1  My description 1  Tutorial   True       NaN       NaN
1  /some/page-2.md  My title 2  My description 2  Tutorial  False       abc       NaN
2  /some/page-3.md  My title 3  My description 3  Tutorial   True       NaN       def

Is there a way to do this with Pandas?


Solution

  • I have changed your first column names to file, header, and value. So, can easily handle what u want. You need to use pivot_table method to reach your goal. The final code is shown in below.

    df = pd.DataFrame(data, columns=["file", "header", "value"])
    
    
    result = df.pivot_table(index='file', columns='header', values='value', aggfunc='first').reset_index()
    
    result = result[result.index.notna()]
    

    Your output will be like that. Soi we need to remove "header" label .

    header             file custom_1 custom_2       description  index       title      type
    0       /some/page-1.md      NaN      NaN  My description 1   True  My title 1  Tutorial
    1       /some/page-2.md      abc      NaN  My description 2  False  My title 2  Tutorial
    2       /some/page-3.md      NaN      def  My description 3   True  My title 3  Tutorial
    

    For removing "header" label u need to use :

    result.columns.name = None
    

    Final output will be like

                  file custom_1 custom_2       description  index       title      type
    0  /some/page-1.md      NaN      NaN  My description 1   True  My title 1  Tutorial
    1  /some/page-2.md      abc      NaN  My description 2  False  My title 2  Tutorial
    2  /some/page-3.md      NaN      def  My description 3   True  My title 3  Tutorial