i get a string var (reff) from pandas to_string
the reff string is firstly seperated by \n. then for each row, the seperator consists of various spaces. and as you can see in the screenshot, each column is aligned on the right. how to convert it back to pandas dataframe ? thanks.
reff_split = [' Name Average Cost Cuisines Aggregate Rating City',
' Thai Garden 13 Cafe, American, Desserts 4.2 Abilene',
' Crispy Crust 54 Desserts, Bakery, Cafe, American, Seafood 3.4 Abilene',
' Finger Licious 35 Fast Food, Cafe, BBQ, Seafood 0.0 Abilene',
' Mx Corn 62 Tea, Cafe, Italian 3.1 Abilene',
' LPK Waterfront 61 Pizza, Italian, BBQ, Fast Food, Seafood 3.7 Abilene',
' Cakes Degree 18 Cafe, Mexican, BBQ, Desserts 3.4 Abilene',
' Chateau Garlic 42 Pizza, Bakery, Desserts, Fast Food 3.0 Abilene',
' Mediumwelldone 91 Tea, French, BBQ, Cafe, Seafood 3.5 Abilene',
' Biryani Express 11 Tea, Pizza, French, Bakery, Fast Food 0.0 Abilene',
' Subway 57 Fast Food, Tea, Bakery, Italian 3.3 Abilene',
' The Grand Trunk Road 80 Tea, Pizza, Bakery, BBQ, Chinese, Mediterranean 3.9 Abilene',
" Bhikharam's 89 Bakery, Pizza, Desserts 3.0 Abilene",
' Pawan Foods 74 Tea, Cafe, American, Desserts 0.0 Abilene',
' Dilli Darbar 40 Pizza, Bakery, BBQ, Seafood 2.3 Abilene',
' Piyu Fast Food 88 Mexican, Indian, Desserts, Pizza 0.0 Abilene',
' Madras Cafe 54 Tea, Pizza, American, Cafe, Indian, Seafood 0.0 Abilene',
' Druk 79 Cafe, Bakery, Pizza, Seafood 4.1 Abilene',
' Barista 88 Fast Food, Mexican, Bakery, Pizza 3.3 Abilene',
' Mamagoto 93 Desserts, Tea, Bakery, Cafe, Indian, Mediterranean 4.1 Abilene',
' Sindhi Kulfi 37 Pizza, French, BBQ, Fast Food, Cafe 3.0 Abilene',
' Aggarwal Sweet & Bakers 79 Pizza, Mediterranean, BBQ 2.9 Abilene',
' Lotus Kitchen 15 French, Bakery, BBQ 0.0 Abilene',
' Inam Muradabadi 35 Tea, Seafood, Mediterranean, Fast Food 0.0 Abilene',
" Domino's Pizza 89 Bakery, Pizza, American, Seafood 0.0 Abilene",
' Linx - Premier Inn 44 Tea, Bakery, Pizza 3.0 Abilene',
' Scrummy Bites 69 Indian, Desserts, Seafood 3.1 Abilene',
'Vanshika Indian, Chinese, & Parantha Corner 60 Pizza, Fast Food 3.1 Abilene',
' Otik Hotshop 82 Desserts, Tea, BBQ, Fast Food, Indian 3.1 Abilene',
' Gelato Vinto 12 Seafood, Mexican, Indian, Fast Food 3.1 Abilene',
" Tomato's 21 Tea, Cafe, Bakery, BBQ 4.1 Abilene"]
I doubt the solutions provided in the linked duplicate (including the nested one) can help you bring the dataframe back. In your context, one solution would be to compute the widths of each column in the header (first element/line) and pass it to read_fwf
:
import io, re
header = re.findall(r".+?\S(?=\s\s+|$)", reff_split[0])
df = pd.read_fwf(io.StringIO("\n".join(reff_split)), widths=map(len, header))
NB: The regex pattern uses a lookahead because the columns are right-aligned.
Output :
Name Average Cost Cuisines Aggregate Rating City
0 Thai Garden 13 Cafe, Ameri... 4.2 Abilene
1 Crispy Crust 54 Desserts, B... 3.4 Abilene
2 Finger Licious 35 Fast Food, ... 0.0 Abilene
3 Mx Corn 62 Tea, Cafe, ... 3.1 Abilene
4 LPK Waterfront 61 Pizza, Ital... 3.7 Abilene
.. ... ... ... ... ...
25 Scrummy Bites 69 Indian, Des... 3.1 Abilene
26 Vanshika In... 60 Pizza, Fast... 3.1 Abilene
27 Otik Hotshop 82 Desserts, T... 3.1 Abilene
28 Gelato Vinto 12 Seafood, Me... 3.1 Abilene
29 Tomato's 21 Tea, Cafe, ... 4.1 Abilene
[30 rows x 5 columns]