[SOLVED] Modify Data Step with Duplicate Values?

Modify Data Step with Duplicate Values?

I have a dataset that has two columns of data. I'm needing to modify and change the name of the observations and the below is working great.

data NAS_STG;
   modify NAS_STG;
   if Balance = "Test1" then Balance ="Test Account 1";
run;

However there is one issue. The datasheet this was imported from has some duplicate observations. They aren't actually the same, just in different sections with the same observation name.

Is there a way to change the first occurrence, then after the first occurrence, when it finds the duplicate I can name it something else? I need to ensure the duplicate sequential observations are not named the same.

e.g.

data NAS_STG;
   modify NAS_STG;
   if Balance = "Workout Loan" then Balance ="Workout"; *** FIRST INSTANCE***
   if Balance = "Workout Loan" then Balance ="Int_Workout"; *** SECOND INSTANCE***
run;

Solution

No need for MODIFY for this. Use a simple SET statement instead. Probably best to create a new dataset instead of making changes to the existing one. That way if something goes wrong with the code you don't have re-import the XLSX file.

So if the data is not already sorted then sort it.

proc sort data=NAS_STG; by balance; run;

Now use the fact that it is sorted to let you change only one of the observations.

data NAS_STG_fixed;
  set NAS_STG;
  by balance;
  if first.balance and balance = "Test1" then Balance ="Test Account 1";
run;

You might need to re sort it by BALANCE if you are going to use that variable to combine with some other data.

proc sort data=NAS_STG_fixed;
   by balance;
run;