javabufferedreaderfilereader

Why checking isEmpty each row during BufferedReader makes a difference in Java reading file line by line


I have a piece of code reading a file. The content of the file is just an id for each line. Nothing special. However, I'm not sure why each style of this code produces different results.

The content of the file looks like this

a9c536f6-592f-4458-b61b-6ed3f875e990
d6b384d3-8fae-2dd3-ed06-b57cb65b093c
31474dd6-5c3b-4dc4-82cb-7e05121a77e1
b37dd457-da49-4972-9e3d-cd642e975562
f88df909-35bc-453c-b706-d48b5a11af9a
5d75083b-c4a8-4ad1-8494-645aaabfb283
85d8170e-8982-43c2-b574-9d7fe96d1b46
68bf33d0-1b5e-480c-83f2-ae38d9e0c8c8
26ea9d4a-f509-43b2-850f-cc445dccfd33
f9e9bd88-f67d-4e66-b281-eceeb57985eb
    @Test
    void testReadFile() throws Exception
    {
        String fileName = "/Users/user/Downloads/payload";
        // #1
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        Set<String> readUsingCollect = reader.lines().filter(l -> !l.isEmpty()).collect(Collectors.toSet());

        reader = new BufferedReader(new FileReader(fileName));
        String row;

        // #2
        Set<String> readUsingNonEmpty = new HashSet<>();
        while ((row = reader.readLine()) != null && !row.isEmpty()) {
            readUsingNonEmpty.add(row);
        }

        reader = new BufferedReader(new FileReader(fileName));

        // #3
        String row1;
        Set<String> readNonCheckingNonEmpty = new HashSet<>();
        int counter = 0;
        while ((row1 = reader.readLine()) != null) {
            if (row1.isEmpty()) {
                counter += 1;
            }
            readNonCheckingNonEmpty.add(row1);
        }

        Sets.SetView<String> difference = Sets.difference(readNonCheckingNonEmpty, readUsingNonEmpty);

        System.out.println("Size readUsingCollect: " + readUsingCollect.size());
        System.out.println("Size readUsingNonEmpty: " + readUsingNonEmpty.size());
        System.out.println("Size readNonCheckingNonEmpty: " + readNonCheckingNonEmpty.size());
        System.out.println("Counter: " + counter);
        System.out.println("Difference in size: " + difference.size());
        reader.close();
    }

Here's the result from running the above code

Size readUsingCollect: 7525
Size readUsingNonEmpty: 5498
Size readNonCheckingNonEmpty: 7526
Counter: 2
Difference in size: 2028

I wonder why #2 produces different result than #3 and #1. The content of the file doesn't have empty line. What puzzling me the most is I would expect #1 and #2 to be the same but they are different.


Solution

  • As @Cwift mentioned in a comment:

    The problem with approach #2 is that

        while ((row = reader.readLine()) != null && !row.isEmpty()) {}
    

    does not filter out empty lines - it terminates the loop at the first empty line.

    If you want to read the whole file and just filter out empty lines you need to implement approach #2 as:

        Set<String> readUsingNonEmpty = new HashSet<>();
        while (row = reader.readLine()) {
            if (!row.isEmpty()) {
                readUsingNonEmpty.add(row);
            }
        }