javautf-8java.util.scannerutfbyte-order-mark

While reading a CSV I get a question mark at the beginning


I'm trying to do a small school practice about Java Text I/O and while trying to read a CSV file with name prefixes (a Dutch thing) and surnames I got a question mark in the beginning.

It's a small exercise where I need to add my code to an already existing project with 3 small files to practice the use of Text I/O, see project code: https://github.com/Remzi1993/klantenBestand

public void vulNamenLijst() {
    // TODO: Lees het bestand "resources/NamenlijstGroot.csv" en zet elke regel (<tussenvoegsel>,<achternaam>)
    // in de ArrayList namenLijst.

    file = new File("resources/NamenlijstGroot.csv");

    try (
            Scanner scanner = new Scanner(file);
    ) {
        while (scanner.hasNext()) {
            String line = scanner.nextLine();
            String[] values = line.split(",");
            String namePrefix = values[0];
            String surname = values[1];
            namenLijst.add(namePrefix + " " + surname);
        }
    } catch (FileNotFoundException e) {
        System.err.println("Data file doesn't exist!");
    } catch (Exception e) {
        System.err.println("Something went wrong");
        e.printStackTrace();
    }
}

I'm sorry for the use of Dutch and English at the same time in the code. I try to write my own code in English, but this code exercise already existed and I only needed to add some code with the //TODO to practice Text I/O.

This is what I get: Screenshot

My CSV file: CSV file screenshot


Solution

  • I found an easy solution:

    final String UTF8_BOM = "\uFEFF";
    
    if (line.startsWith(UTF8_BOM)) {
        line = line.substring(1);
    }
    

    A simple workable example:

    File file = new File("resources/NamenlijstGroot.csv");
    
    try (
        Scanner scanner = new Scanner(file, StandardCharsets.UTF_8);
    ) {
        while (scanner.hasNext()) {
            String line = scanner.nextLine().strip();
    
            final String UTF8_BOM = "\uFEFF";
    
            if (line.startsWith(UTF8_BOM)) {
                line = line.substring(1);
            }
    
            String[] values = line.split(",");
            String namePrefix = values[0];
            String surname = values[1];
            namenLijst.add(namePrefix + " " + surname);
        }
    } catch (FileNotFoundException e) {
        System.err.println("Data file doesn't exist!");
    } catch (Exception e) {
        System.err.println("Something went wrong");
        e.printStackTrace();
    }