csvfile-iocobolmicrofocus

COBOL .csv File IO into Table Not Working


I am trying to learn Cobol as I have heard of it and thought it would be fun to take a look at. I came across MicroFocus Cobol, not really sure if that is pertinent to this post though, and since I like to write in visual studio it was enough incentive to try and learn it.

I've been reading alot about it and trying to follow documentation and examples. So far I've gotten user input and output to the console working so then I decided to try file IO out. That went ok when I was just reading in a 'record' at a time, I realize that 'record' may be incorrect jargon. Although I've been programming for a while I am an extreme noob with cobol.

I have a c++ program that I have written before that simply takes a .csv file and parses it then sorts the data by whatever column the user wants. I figured it wouldn't be to hard to do the same in cobol. Well apparently I have misjudged in this regard.

I have a file, edited in windows using notepad++, called test.csv which contains:

4001942600,140,4
4001942700,141,3
4001944000,142,2

This data is from the us census, which has column headers titled: GEOID, SUMLEV, STATE. I removed the header row since I couldn't figure out how to read it in at the time and then read in the other data. Anywho...

In Visual Studio 2015, on Windows 7 Pro 64 Bit, using Micro Focus, and step debugging I can see in-record containing the first row of data. The unstring works fine for that run but the next time the program 'loops' I can step debug, and view in-record and see it contains the new data however the watch display when I expand the watch elements looks like the following:

        REC-COUNTER 002 PIC 9(3) 
+       IN-RECORD   {Length = 42} : "40019427004001942700             000      "    GROUP
-       GEOID   {Length = 3}    PIC 9(10) 
        GEOID(1)    4001942700  PIC 9(10) 
        GEOID(2)    4001942700  PIC 9(10) 
        GEOID(3)    <Illegal data in numeric field> PIC 9(10) 

-       SUMLEV  {Length = 3}    PIC 9(3) 
        SUMLEV(1)   <Illegal data in numeric field> PIC 9(3) 
        SUMLEV(2)   000 PIC 9(3) 
        SUMLEV(3)   <Illegal data in numeric field> PIC 9(3) 

-       STATE   {Length = 3}    PIC X
        STATE(1)        PIC X
        STATE(2)        PIC X
        STATE(3)        PIC X

So I'm not sure why that just before the Unstring operation the second time around I can see the proper data, but after the unstring happens incorrect data is then stored in the 'table'. What is also interesting is that if I continue on the third time around the correct data is stored in the 'table'.

         identification division.
         program-id.endat.
         environment division.
         input-output section.
         file-control.
             select in-file assign to "C:/Users/Shittin Kitten/Google Drive/Embry-Riddle/Spring 2017/CS332/group_project/cobol1/cobol1/test.csv"
                organization is line sequential.
         data division.     
         file section.
         fd in-file.  
         01 in-record.
             05 record-table.
                 10 geoid     occurs 3 times        pic 9(10).
                 10 sumlev   occurs 3 times       pic 9(3).
                 10 state       occurs 3 times       pic X(1).
         working-storage section.
    01 switches.
     05 eof-switch pic X value "N".
  *  declaring a local variable for counting
    01 rec-counter pic 9(3).
  *  Defining constants for new line and carraige return. \n \r DNE in cobol!
    78 NL  value X"0A".
    78 CR  value X"0D".
    78 TAB value X"09".

  ******** Start of Program ******
   000-main.
     open input in-file.
       perform 
       perform 200-process-records
         until eof-switch = "Y".
       close in-file;
     stop run.
  *********** End of Program ************

  ******** Start of Paragraph  2 *********
   200-process-records.
       read in-file into in-record
         at end move "Y" to eof-switch
         not at end compute rec-counter = rec-counter + 1;
       end-read.
       Unstring in-record delimited by "," into 
           geoid in record-table(rec-counter), 
           sumlev in record-table(rec-counter), 
           state in record-table(rec-counter).

     display "GEOID  " & TAB &">> " & TAB & geoid of record-table(rec-counter).
     display "SUMLEV  >> " & TAB & sumlev of record-table(rec-counter).
     display "STATE  "  & TAB &">> " & TAB & state of record-table(rec-counter) & NL.
  ************* End of Paragraph 2  **************    

I'm very confused about why I can actually see the data after the read operation, but it isn't stored in the table. I have tried changing the declarations of the table to pic 9(some length) as well and the result changes but I can't seem to pinpoint what I'm not getting about this.


Solution

  • I think there are a few things you've not grasped yet, and which you need to.

    In the DATA DIVISION, there are a number of SECTIONs, each of which has a specific purpose.

    The FILE SECTION is where you define data structures which represent data on files (input, output or input-output). Each file has an FD, and subordinate to an FD will be one or more 01-level structures, which can be extremely simple, or complex.

    Some of the exact behaviour is down to particular implementation for a compiler, but you should treat things this way, for your own "minimal surprise" and for the same of anyone who has to later amend your programs: for an input file, don't change the data after a READ, unless you are going to update the record (of if you are using a keyed READ, perhaps). You can regard the "input area" as a "window" on your data-file. The next READ, and the window is pointed to a different position. Alternatively, you can regard it as "the next record arrives, obliterating what was there previously". You have put the "result" of your UNSTRING into the record-area. The result will for sure disappear on the next read. You have the possibility (if the window is true for your compiler, and depending on the mechanism it uses for IO) of squishing the "following" data as well.

    Your result should be in the WORKING-STORAGE, where it will remain undisturbed by new records being read.

    READ filname INTO data-description is an implicit MOVE of the data from the record-area to data-description. If, as you have specified, data-description is the record-area, the result is "undefined". If you only want the data in the record-area, just a plain READ filename is all that is needed.

    You have a similar issue with your original UNSTRING. You have the source and target fields referencing the same storage. "Undefined" and not the result you want. This is why the unnecessary UNSTRING "worked".

    You have a redundant inline PERFORM. You process "something" after end-of-file. You make things more convoluted by using unnecessary "punctuation" in the PROCEDURE DIVISION (which you've apparently omitted to paste). Try using ADD instead of COMPUTE there. Look at the use of FILE STATUS, and of 88-level condition-names.

    You don't need a "new line" for DISPLAY, because you get one for free unless you use NO ADVANCING.

    You don't need to "concatenate" in the DISPLAY, because you get that for free as well.

    DISPLAY and its cousin, ACCEPT, are the verbs (only intrinsic functions are functions in COBOL (except where your compiler supports user-defined functions)) which vary the most from compiler to compiler. If your complier supports SCREEN SECTION in the DATA DIVISION you can format and process user-input in "screens". If you were to use IBM's Enterprise COBOL you'd have very basic DISPLAY/ACCEPT.

    You "declare a local variable". Do you? In what sense? Local to the program.

    You can pick up quite a lot of tips by looking at COBOL questions here from the last few years.