[SOLVED] Pandas does not separate columns of imported csv file

Pandas does not separate columns of imported csv file

I want to import a csv file as a dataframe using pandas. The file's structure looks as provided in the screenshot. (https://i.sstatic.net/N91d7.png) However, for some reason, using

df = pd.read_csv("Test.csv", delimiter = ',')

does not work. The resulting dataframe contains all content in one single column.

How can I separate the columns correctly? Thanks in advance.

I already played around with different options for the "read_csv" prompt, however, I did not yet find a solution.

Solution

I'm not sure how you got a csv file with ambiguous commas (the values are unquoted and contain commas, and the separator is also a comma), but the good news is ...

Assuming the entire file follows the format in your question (specifically, if every second column is empty), this should do what you want:

import pandas as pd
text='Fact,Fact Note,' + ','.join(f'{x},Value Note for {x}' for x in ['California','Arkansas','Arizona','Alaska','Alabama','United States'])
text += '''
Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554,'''
print(text,'\n')

from io import StringIO
i = 0
rows = []

with StringIO(text) as f:
    for line in f:
        if not i:
            columns = line[:-1].split(',') #-1 is to avoid newline
            print(columns)
        else:
            vals = line[:-1].split(',,')
            row = [vals[0],''] + [x for z in zip([int(val.replace(',', '')) for val in vals[1:]], ['']*(len(vals)-1)) for x in z]
            rows.append(row)
        i += 1
print(rows)
df = pd.DataFrame(rows, columns=columns)
print('','',df,sep='\n')

Sample input:

Fact,Fact Note,California,Value Note for California,Arkansas,Value Note for Arkansas,Arizona,Value Note for Arizona,Alaska,Value Note for Alaska,Alabama,Value Note for Alabama,United States,Value Note for United States
Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554

Output:

                                          Fact Fact Note  California Value Note for California  ...  Alabama Value Note for Alabama  United States Value Note for United States
0  Population Estimates, July 1, 2022, (V2022)              39029342                            ...  5074296                             333287557
1  Population Estimates, July 1, 2021, (V2021)              39142991                            ...  5049846                             332031554

[2 rows x 14 columns]

Transposed output (easier to read):

                                                                        0                                            1
Fact                          Population Estimates, July 1, 2022, (V2022)  Population Estimates, July 1, 2021, (V2021)
Fact Note
California                                                       39029342                                     39142991
Value Note for California
Arkansas                                                          3045637                                      3028122
Value Note for Arkansas
Arizona                                                           7359197                                      7264877
Value Note for Arizona
Alaska                                                             733583                                       734182
Value Note for Alaska
Alabama                                                           5074296                                      5049846
Value Note for Alabama
United States                                                   333287557                                    332031554
Value Note for United States

Note that I have used a string and StringIO class instead of a text file, for ease of creating the example.