sasfilteringfips

SAS DATA FILTERING QUESTION: I'm trying to filter and narrow down a large dataset in SAS to specific counties in the US


I have a large SAS dataset which I am trying to narrow down to specific counties in the US. The variables I have are: 'state_FIPS_code' and 'county_FIPS_code'. Each state has a unique FIPS code (e.g. California FIPS = 006, New York FIPS = 036). However, some counties can have the same FIPS code as other counties in another states (e.g. Autauga County, Alabama FIPS = 001, and Kent County, Delaware FIPS = 001).

So, in order to narrow down the dataset to the counties I need, I first thought of narrowing down the states, and then narrowing down the counties. The problem is however, unwanted counties would be included as well. For example, if I narrow it down to Florida, Alabama, and Texas, and then narrow down the county FIPS codes to 1, 3, and 5. Then it might include county #5 in Florida even though I only want the one in Texas.

I have been struggling with this issue for weeks and can't seem to figure it out. I'm trying to find the most efficient way to do this without having to write 500+ lines of code. Thank you in advance for your help.


Solution

  • The FIPS codes for counties already includes the FIPS codes for the state. https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt

    Examples:

     county-level      place
      FIPS code        name
     ------------    --------------
        01000        Alabama
        01001        Autauga County
        01003        Baldwin County
    ...
        02000        Alaska
        02013        Aleutians East Borough
        02016        Aleutians West Census Area
    ...
    

    If for some reason your dataset has only stored the last 3 digits of the county codes you can combine them back together to get the actual FIPS county code.

    So if your variables are already CHARACTER strings then use

    where cats(state_FIPS_code,county_FIPS_code) in 
        ('01001'
        ,'01003'
        ,'02013'
        ,'02016'
        )
    

    If your variables are numeric instead then first use the PUT() function to convert them to strings.

    where put(state_FIPS_code,Z2.)||put(county_FIPS_code,Z3.) in  ...
    

    Normally to avoid typing a long list of codes like that you would put the list into a dataset and then just combine the two datasets.

    If the data is already sorted by state_FIPS_code and county_FIPS_code then a simple MERGE is the easiest. For example here is what such a merge would look like if you have the county codes split into two variables:

    data want ;
      merge have(in=in1) county_list(in=in2);
      by state_FIPS_code county_FIPS_code;
      if in1 and in2;
    run;