javaforkjoinpool

ForkJoinPool process Arraylist and return two arrays


i have a huge List<String[]> like about 500k elements validation of it takes too long - 35-40 sec validation looks like this

   Iterator<String[]>iterator=parser.iterate(request.getInputStream()).iterator();
    List<String[]> list =new ArrayList<>();
    List<NotValidRow>badList=new ArrayList<>();
    while (iterator.hasNext()){
      var tmp=iterator.next();
      if(tmp.length!=2)continue;
      if (tmp[0] == null || !SKIP_PATTERN.matcher(tmp[0]).matches()) {
        badList.add(new NotValidRow(tmp[0], tmp[1], NotValidRowReason.NOT_VALID_EMAIL));
      }
      if(tmp[1]==null || tmp[1].isBlank()){
        badList.add(new NotValidRow(tmp[0],tmp[1],NotValidRowReason.EMPTY_NAME));
      }
      list.add(tmp);
    }

i think its possible to do it faster with fork join pool but i dnt know how to do it, could you guys help me wtih that


Solution

  • You can use Stream parallel processing, however, you'll have to sneak out the bad list in a thread-safe manner: for example:

    var spliterator = Spliterators.spliteratorUnknownSize(iterator, 0);
    
    var badQueue = new ConcurrentLinkedQueue<NotValidRow>();
    
    List<String[]> list = StreamSupport.stream(spliterator, true)
        .filter(tmp -> {
            if (tmp.length != 2) {
                return false;
            }
            if (tmp[0] == null || !SKIP_PATTERN.matcher(tmp[0]).matches()) {
                badQueue.offer(new NotValidRow(tmp[0], tmp[1], NotValidRowReason.NOT_VALID_EMAIL));
                return false;
            }
            if (tmp[1] == null || tmp[1].isBlank()){
                badQueue.offer(new NotValidRow(tmp[0], tmp[1], NotValidRowReason.EMPTY_NAME));
                return false;
            }
            return true;
        })
        .toList();
    
    List<NotValidRow> badList = new ArrayList<>(badQueue);
    

    Edit Apparently, the OP didn't mean to include the bad entries in the good list, so I've updated the answer to filter out the bad entries.