pythonpandassaspy

how to filter / query data from Python SASPY to_df function


I am working on python on some data get from a SAS server. I am currently using SASPY to_df() function to bring it from SAS to local pandas.

I would like to know if its possible to filter/query the data that is being transferred so I could avoid bringing unneeded that and speeding up my download.

I couldn't find anything on saspy documentation, it only offers the possibility of using "**kwargs" but I couldn't figure out how to do it.

Thanks.


Solution

  • You need to define the sasdata object using the WHERE= dataset option to limit the observations pulled.

    https://sassoftware.github.io/saspy/api.html#saspy.sasdata.SASdata

    Then when you use the to_df() method only the selected data will be transferred.

    You can also use the KEEP= or DROP= dataset option to limit the variables that are transferred. Remember that in order to reference any variables in the WHERE= option they have to be kept.

    The "**kwargs" looks to be about changing how you connect to the SAS server, so that is not important for what you want.