pandasstatadta

Unable to open .dta files because of version


Version of given Stata file is 44. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).

import pandas as pd
Citations2 = pd.io.stata.read_stata('Citations_2000-2010 part 2.dta')

I want to convert this file into csv.


Solution

  • Install pyreadstat

    # pip install pyreadstat
    import pyreadstat
    
    df, meta = pyreadstat.read_dta('Citations_2000-2010 part 2.dta')
    
    df.to_csv('Citations_2000-2010 part 2.csv', index=None)
    

    Details:

    >>> df.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 13569764 entries, 0 to 13569763
    Data columns (total 8 columns):
     #   Column       Dtype  
    ---  ------       -----  
     0   patent       int64  
     1   citation     float64
     2   cit_date     object 
     3   cit_name     object 
     4   cit_kind     object 
     5   cit_country  object 
     6   category     object 
     7   citseq       object 
    dtypes: float64(1), int64(1), object(6)
    memory usage: 828.2+ MB