javasplitdate-parsingdayofmonth

more efficient way to split string


I've been wondering for a long time. Is there a way to split in-between two keyword identifiers properly without redundant splits? For example we have the text:

String text = "ASD WORD-BE HERE YUP";

and we want to extract "BE". So we would do two splits.

String extractedWord = text.split(" ")[1].split("-")[1];

Is there a better way to do this without knowing the exact regular expression? I'm working on a parser for that reads a pdf and I suppose my regex might be a date but there is an unformatted date so I would have to specifically look for "MONTH - MONTH, DAY, YEAR" which is a bit hard to setup for regex. Thanks!


Solution

  • One option would be regex capture groups. Unfortunately it ends up being more code:

    var pattern = Pattern.compile("-(.*?)\\s");
    var matcher = pattern.matcher("ASD WORD-BE HERE YUP");
    var extracted = matcher.find() ? matcher.group(1) : null;
    
    assert extracted.equals("BE");
    

    One potential issue with your code is you always assume the input is valid (which may be fine for your use-case), i.e. you never check the length of the arrays returned by String#split before indexing into them.