I have a text file of next format: each line starts with a string which is followed by sequence of numbers. Each line has unknown length (unknown amount of numbers, amount from 0 to 1000).
string_1 3 90 12 0 3
string_2 49 0 12 94 13 8 38 1 95 3
.......
string_n 9 43
Afterwards I must handle each line with handleLine
method which accept two arguments: string name and numbers set (see code below).
How to read the file and handle each line with handleLine
efficiently?
My workaround:
Files.lines
. Is it blocking?I think it's pretty uneffective due 2nd and 3rd steps. 1st step mean that java convert file bytes to string first and then in 2nd and 3rd steps I convert them back to String
/Set<Integer>
. Does that influence performance a lot? If yes - how to do better?
public handleFile(String filePath) {
try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
stream.forEach(this::indexLine);
} catch (IOException e) {
e.printStackTrace();
}
}
private void handleLine(String line) {
List<String> resultList = this.parse(line);
String string_i = resultList.remove(0);
Set<Integer> numbers = resultList.stream().map(Integer::valueOf).collect(Collectors.toSet());
handleLine(string_i, numbers); // Here is te final computation which must to be done only with string_i & numbers arguments
}
private List<String> parse(String str) {
List<String> output = new LinkedList<String>();
Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
while (match.find()) {
output.add(match.group());
}
return output;
}
Regarding your first question, it depends on how you reference the Stream
. Streams
are inherently lazy, and don't do work if you're not going to use it. For example, the call to Files.lines
doesn't actually read the file until you add a terminal operation on the Stream
.
From the java doc:
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed
The forEach(Consumer<T>)
call is a terminal operation, and, at that point, the lines of the file are read one by one and passed to your indexLine
method.
Regarding your other comments, you don't really have a question here. What are you trying to measure/minmize? Just because something is multiple steps doesn't inherently make it have poor performance. Even if you created a wizbang oneliner to convert from the File
bytes directly to your String
& Set
, you probably just did the intermediate mapping anonymously, or you've called something that will cause the compiler to do that anyway.