javaexcelregexcsv

Filter out digits from a CSV File using Java


I am new to CSV Parsing. I have a CSV file where the 3rd column (a description field) may have one or more 6 digit numbers along with other values. I need to filter out those numbers and write them in the adjacent column corresponding to each row.

Eg:

3rd column                       4th column
=============                    ===========
123456adjfghviu77                123456

shgdasd234567                    234567

123456abc:de234567:c567890d      123456-234567-567890

12654352474                        

Please help. This is what I have done so far.

        String strFile="D:/Input.csv";
        CSVReader reader=new CSVReader(new FileReader(strFile));

        String[] nextline;
        //int lineNumber=0;
        String str="^[\\d|\\s]{5}$";
        String regex="[^\\d]+";

        FileWriter fw = new FileWriter("D:/Output.csv");
        PrintWriter pw = new PrintWriter(fw);


        while((nextline=reader.readNext())!=null){
            //lineNumber++;
            //System.out.println("Line : "+lineNumber);
            if(nextline[2].toString().matches(str)){
            pw.print(nextline[1]);
            pw.append('\n');
            System.out.println(nextline[2]);
            }               

        }
        pw.flush();

Solution

  • I suggest just matching 6-digit chunks, and build a new string when collecting matches:

    String s = "123456abc:de234567:c567890d";
    StringBuilder result = new StringBuilder();
    Pattern pattern = Pattern.compile("(?<!\\d)\\d{6}(?!\\d)");  // Pattern to match 6 digit chunks not enclosed with digits
    Matcher matcher = pattern.matcher(s);
    while (matcher.find()){
        if (result.length() == 0)  {              // If the result is empty
            result.append(matcher.group(0));      // add the 6 digit chunk
        } else {
           result.append("-").append(matcher.group(0)); // else add a delimiter and the digits after it
        }
    } 
    System.out.println(result.toString());      // Demo, use this to write to your new column
    

    See the Java demo

    UPDATE: I have changed the pattern from "\\d{6}" to "(?<!\\d)\\d{6}(?!\\d)" to make sure we only match 6-digit chunks that are not enclosed with other digits.

    See the regex demo