pythonrcsvstatadta

Convert .CSV files to .DTA files in Python


I'm looking to automate the process of converting many .CSV files into .DTA files via Python. .DTA files is the filetype that is handled by the Stata Statistics language.

I have not been able to find a way to go about doing this, however.

The R language has write(.dta) which allows a dataFrame in R to be converted to a .dta file, and there is a port to the R language from Python via RPy, but I can't figure out how to use RPy to access the write(.dta) function in R.

Any ideas?


Solution

  • You need rpy2 for Python and also the foreign package installed in R. You do that by starting R and typing install.packages("foreign"). You can then quit R and go back to Python.

    Then this:

    import rpy2.robjects as robjects
    robjects.r("require(foreign)")
    robjects.r('x=read.csv("test.csv")')
    robjects.r('write.dta(x,"test.dta")')
    

    You can construct the string passed to robjects.r from Python variables if you want, something like:

    robjects.r('x=read.csv("%s")' % fileName)