I have a text file called tropical.txt
that have multiple lists and each list is separated by a new line. Notice the comma is surrounded by spaces.
space here and space here
| |
['papaya' , 'mangosteen' , 'banana']
[]
['coconut' , 'mango']
['mangosteen' , 'papaya']
I tried the following code
import pandas as pd
df = pd.read_csv('tropical.txt', sep='\n', header=None, engine = 'python')
df
which gives me
ValueError: Specified \n as separator or delimiter. This forces the python engine which does not accept a line terminator. Hence it is not allowed to use the line terminator as separator.
If I were to just do
import pandas as pd
df = pd.read_csv('tropical.txt', header= None, engine = 'python')
df
The output isn't what I wanted
0 1 2
0 ['papaya' 'mangosteen' 'banana']
1 [] None None
2 ['coconut' 'mango'] None
3 ['mangosteen' 'papaya'] None
I am expecting
0
0 [papaya,mangosteen,banana]
1 []
2 [coconut,mango]
3 [mangosteen,papaya]
Any suggestion?
You can use read_csv
, by specifying a separator which will not occur in the lines (e.g. \0
) (so that each line will be read as a whole) and ast.literal_eval
as a converter for the values:
import ast
pd.read_csv('tropical.txt', header=None, sep='\0', names=['fruits'], converters={ 'fruits' : ast.literal_eval })
Output:
fruits
0 [papaya, mangosteen, banana]
1 []
2 [coconut, mango]
3 [mangosteen, papaya]