I want to import a csv file as a dataframe using pandas. The file's structure looks as provided in the screenshot. (https://i.sstatic.net/N91d7.png) However, for some reason, using
df = pd.read_csv("Test.csv", delimiter = ',')
does not work. The resulting dataframe contains all content in one single column.
How can I separate the columns correctly? Thanks in advance.
I already played around with different options for the "read_csv" prompt, however, I did not yet find a solution.
I'm not sure how you got a csv file with ambiguous commas (the values are unquoted and contain commas, and the separator is also a comma), but the good news is ...
Assuming the entire file follows the format in your question (specifically, if every second column is empty), this should do what you want:
import pandas as pd
text='Fact,Fact Note,' + ','.join(f'{x},Value Note for {x}' for x in ['California','Arkansas','Arizona','Alaska','Alabama','United States'])
text += '''
Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554,'''
print(text,'\n')
from io import StringIO
i = 0
rows = []
with StringIO(text) as f:
for line in f:
if not i:
columns = line[:-1].split(',') #-1 is to avoid newline
print(columns)
else:
vals = line[:-1].split(',,')
row = [vals[0],''] + [x for z in zip([int(val.replace(',', '')) for val in vals[1:]], ['']*(len(vals)-1)) for x in z]
rows.append(row)
i += 1
print(rows)
df = pd.DataFrame(rows, columns=columns)
print('','',df,sep='\n')
Sample input:
Fact,Fact Note,California,Value Note for California,Arkansas,Value Note for Arkansas,Arizona,Value Note for Arizona,Alaska,Value Note for Alaska,Alabama,Value Note for Alabama,United States,Value Note for United States
Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554
Output:
Fact Fact Note California Value Note for California ... Alabama Value Note for Alabama United States Value Note for United States
0 Population Estimates, July 1, 2022, (V2022) 39029342 ... 5074296 333287557
1 Population Estimates, July 1, 2021, (V2021) 39142991 ... 5049846 332031554
[2 rows x 14 columns]
Transposed output (easier to read):
0 1
Fact Population Estimates, July 1, 2022, (V2022) Population Estimates, July 1, 2021, (V2021)
Fact Note
California 39029342 39142991
Value Note for California
Arkansas 3045637 3028122
Value Note for Arkansas
Arizona 7359197 7264877
Value Note for Arizona
Alaska 733583 734182
Value Note for Alaska
Alabama 5074296 5049846
Value Note for Alabama
United States 333287557 332031554
Value Note for United States
Note that I have used a string and StringIO class instead of a text file, for ease of creating the example.