I have a csv file with the dimensions 100*512
, I want to process it further in spark
. The problem with the file is that it doesn't contain header i.e column names
. I need these column names for further ETL in machine learning
. I have the column names in another file(text file). I have to put these column names as headers in the csv file mentioned above.
e.g.
CSV file :-
ab 1 23 sf 23 hjh
hs 6 89 iu 98 adf
gh 7 78 pi 54 ngj
jh 5 22 kj 78 jdk
Column headers file :-
one,two,three,four,five, six
I want the output like this :-
one two three four five six
ab 1 23 sf 23 hjh
hs 6 89 iu 98 adf
gh 7 78 pi 54 ngj
jh 5 22 kj 78 jdk
Please suggest some method to add the column heads to the CSV file.(Without replacing the row of the csv file. I tried it by converting it to pandas dataframe but can't get the expected output.
First read your csv file:
from pandas import read_csv
df = read_csv('test.csv')
If there are two columns in your dataset(column a, and column b) use:
df.columns = ['a', 'b']
Write this new dataframe to csv
df.to_csv('test_2.csv')