I need to generate postgres schema from a dataframe. I found csvkit library to come closet to matching datatypes. I can run csvkit and generate postgres schema over a csv on my desktop via terminal through this command found in docs:
csvsql -i postgresql myFile.csv
csvkit docs - https://csvkit.readthedocs.io/en/stable/scripts/csvsql.html
And I can run the terminal command in my script via this code:
import os
a=os.popen("csvsql -i postgresql Desktop/myFile.csv").read()
However I have a dataframe, that I have converted to a csv string and need to generate schema from the string like so:
csvstr = df.to_csv()
In the docs it says that under positional arguments:
The CSV file(s) to operate on. If omitted, will accept
input on STDIN
How do I pass my variable csvstr
into the line of code a=os.popen("csvsql -i postgresql csvstr").read()
as a variable?
I tried to do the below line of code but got an error OSError: [Errno 7] Argument list too long: '/bin/sh'
:
a=os.popen("csvsql -i postgresql {}".format(csvstr)).read()
Thank you in advance
You can't pass such a big string via commandline! You have to save the data to a file and pass its path to csvsql
.
import csv
csvstr = df.to_csv()
with open('my_cool_df.csv', 'w', newline='') as csvfile:
csvwriter= csv.writer(csvfile)
csvwriter.writerows(csvstr)
And later:
a=os.popen("csvsql -i postgresql my_cool_df.csv")