delphi-2010tstringlist

How to extract the first instance of unique strings


I need to extract a list of unique items from 12 years' worth of consistent computer-generated one-per day text files. The filenames vary only by the included date, so it is easy to generate the required name in code. They consist of a list of all the aircraft movements at my local airport during the given day, in time order. Naturally, the same aircraft come and go many times, and the objective is to loop through the files, pick out the first instance of when each individual aircraft appears (the first visit or FV) copy it to a list and then ignore it from then on. The result should be a list of all the first visits in date order. Should be simple, but... My program is small so I am including the entire implementation code.

procedure TForm1.FormCreate(Sender: TObject);
begin
  FileDate := StrToDate('01/01/2007');
  FName := 'E:LGW Reports/SBSLGW2007-01-01.txt'; //1st file to be read
  FDStr := copy(FName, 21, 10);
  TempList := TStringList.Create; //temp holder for file contents
  FVCheckList := TStringList.Create; //holds unique identifier (UID)
  FVCheckList.Sorted := TRUE;
  FVCheckList.Duplicates := dupIgnore;
  FVList:= TStringList.Create;  //the main output
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  i: integer;
begin
  Memo1.Lines.Append('Started');
  Repeat
    TempList.Clear;
    TempList.LoadFromFile(FName);
    for i := 1 to TempList.Count-1 do
    begin
      Line := TempList.Strings[i];
      //create a //create a Unique identifier (UID) from elements in Line          
      Serial := Trim(Copy(Line, 22, 9)); 
      MsnPos1 := Pos('[', Line) + 1;
      MsnPos2 := Pos(']', Line);
      Msn := copy(Line, MsnPos1, (MsnPos2 - MsnPos1));
      UID := Serial + '/' + Msn;
      //          
      if (FVCheckList.IndexOf(UID) < 0) then
      begin
        FVCheckList.Append(UID);
      //Add date of file to Line, otherwise it gives no clue when FV was
        FVList.Append(FormatDateTime('YYYY-MM-DD', FileDate) + ' ' + Line);
        FileDate := IncDay(FileDate, 1);
        FName := 'E:LGW Reports/SBSLGW' + FormatDateTime('YYYY-MM-DD', FileDate) + '.txt';
      end;
    end;
  Until FileExists(FName) = FALSE;
  FVCheckList.SaveToFile('E:LGW Reports/First Visit Checklist.txt');
  FVList.SaveToFile('E:LGW Reports/First Visits.txt');
  Memo1.Lines.Append('Finished');
  Memo1.Lines.SaveToFile('E:LGW Reports/Files parsed.txt');
end;

procedure TForm1.FormClose(Sender: TObject; var Action: TCloseAction);
begin
  TempList.Free;
  FVCheckList.Free;
  FVList.Free;
end;

There are no compiler errors, it runs to completion in seconds and produces the two text files specified, correctly formatted. The big problem is that the lines actually listed in FVList are not always the very first visit of the aircraft, they can be the first, the most recent or somewhere in between. I cannot see any obvious clue as to why the wrong instance is appearing: if my code is right, then something is wrong with the functioning of TStringList FVCheckList. The fault is far more likely to be something I have overlooked, or my understanding of how .dupIgnore works, or maybe my looping isn't working as it should.

I should be very grateful for any practical help. Many thanks in advance.


Solution

  • Repeat
      ...
    Until FileExists(FName) = FALSE;
    

    Should be

    While FileExists(FName) = TRUE do
    Begin
    End;
    

    If the first 2007-01-01 file does not exist, your code will crash on the first LoadFromFile() since you don't check for the file's existence before loading it, unlike with the subsequent files.

    Otherwise, I would suggest sticking with repeat but assign FName at the top of each loop iteration instead of initializing it outside the loop and then reassigning at the bottom of each iteration. No need to duplicate efforts.

    If you check IndexOf() manually, you don't need to use Sorted or dupIgnore at all. This is what you should be doing in this situation. When dupIgnore ignores a new string, Append() doesn't tell you that the string was ignored. To do that, you would have to check whether the Count was actually increased or not.

    Inside the outer loop, the reassignment of FileDate and FName should be outside of the inner for loop,not inside the for loop at all.

    Try this instead:

    procedure TForm1.FormCreate(Sender: TObject);
    begin
      FileDate := EncodeDate(2007,1,1);
      FDStr := FormatDateTime('YYYY-MM-DD', FileDate);
      TempList := TStringList.Create; //temp holder for file contents
      FVCheckList := TStringList.Create; //holds unique identifier (UID)
      FVList := TStringList.Create; //the main output
    end;
    
    procedure TForm1.Button1Click(Sender: TObject);
    var
      i: integer;
    begin
      Memo1.Lines.Append('Started');
      Repeat
        FName := 'E:LGW Reports/SBSLGW' + FormatDateTime('YYYY-MM-DD', FileDate) + '.txt';
        if not FileExists(FName) then Break;
        Memo1.Lines.Append(FName)
        TempList.LoadFromFile(FName);
        for i := 1 to TempList.Count-1 do
        begin
          Line := TempList.Strings[i];
          //create a Unique identifier (UID) from elements in Line
          Serial := Trim(Copy(Line, 22, 9));
          MsnPos1 := Pos('[', Line) + 1;
          MsnPos2 := PosEx(']', Line, MsnPos1);
          Msn := copy(Line, MsnPos1, (MsnPos2 - MsnPos1));
          UID := Serial + '/' + Msn;
          if FVCheckList.IndexOf(UID) = -1 then
          begin
            FVCheckList.Append(UID);
            //Add date of file to Line, otherwise it gives no clue when FV was
            FVList.Append(FormatDateTime('YYYY-MM-DD', FileDate) + ' ' + Line);
          end;
        end;
        FileDate := IncDay(FileDate, 1);
      end;
      FVCheckList.SaveToFile('E:LGW Reports/First Visit Checklist.txt');
      FVList.SaveToFile('E:LGW Reports/First Visits.txt');
      Memo1.Lines.Append('Finished');
      Memo1.Lines.SaveToFile('E:LGW Reports/Files parsed.txt');
    end;
    
    procedure TForm1.FormDestroy(Sender: TObject);
    begin
      TempList.Free;
      FVCheckList.Free;
      FVList.Free;
    end;