I am trying to convert a text
into a dataframe
using Python.
sample_text: 'This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34'
Final Desired output:
Steps I am following to achieve the above output are listed below :
print()
to process this text formatted_text = print('This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34')
but it cant be assigned as print()
returns NoneType
, so I get an error here.Desired output after this step:
This is
sample text
Name|age
--|--
1.abc|45
2.xyz|34
line break text
stored in a variable
to be read as a CSV with the separator |
to create a dataframe: I have been thinking of processing this as pd.read_csv(formatted_text,sep='|', skipinitialspace=True)
Desired_output after this step:
I tried earlier explaining this problem in SO post but I guess I wasn't able to explain it well and it got closed. I Hope I am able to explain my issue this time. It could be a silly task but I have been stuck at this for a long time now and would appreciate any help.
A possible solution:
text = 'This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34'
pd.read_csv(StringIO(text), lineterminator='\n', engine='c', header=None)
Output:
0
0 This is
1 sample text
2 Name|age
3 --|--
4 1.abc|45
5 2.xyz|34
To split the columns, we can use str.split
after read_csv
:
(pd.read_csv(StringIO(text), lineterminator='\n', engine='c', header=None)[0]
.str.split('|', expand=True))
Output:
0 1
0 This is None
1 sample text None
2 Name age
3 -- --
4 1.abc 45
5 2.xyz 34