I'm reading an input stream from which I can parse a list of objects. On the other hand I have a method that accepts a list of such objects.
Something like:
fun InputStream.parser(): List<Object> {
val allBytes = this.readAllBytes()
val result = mutableListOf<Object>()
var currentIndex = 0
for (i in allBytes.indices) {
if (allBytes[i] == '\n'.toByte()) {
result.add(myParser.parse(allBytes, currentIndex, i))
currentIndex = i + 1
}
}
return result
}
fun consumer (input: List<Object>): Unit {
for (obj in input) {
// Do something with it
}
}
As an example, parser could be reading from an ndjson file, consumer could be sending it sending one property of the object via internet.
This implementation is not good, it requires loading the whole file into memory, then loading the whole list of objects into memory.
How could I make this without the overhead?
I guess something like Stream.generate
where each generation is a new object, but I'm not sure how to close it, since Stream.generate is meant for infinite streams.
I'm guessing it shouldn't be an iterable, since iterables are expected to be iterable several times, which is not the case here, once consumed the data is lost.
Note: my actual code does not rely on lines, ndjson is just an example, so I'm looking for code that doesn't use buffereredRead.readLines
You can create a Sequence
, which is basically a lazy list.
It seems like the input stream contains text, and you want to read lines in the stream, so I would suggest operating on a BufferedReader
instead.
fun BufferedReader.parser() = sequence {
for ((currentIndex, line) in lineSequence().withIndex()) {
yield(parser.parse(line, currentIndex))
}
}
// this can also be simplified to
fun BufferedReader.parser() = lineSequence()
.mapIndexed { currentIndex, line -> parser.parse(line, currentIndex) }
The code inside the sequence { ... }
lambda will only be run when a new element is requested, and only runs until the next call to yield
. The consumer
will take a Sequence<Object>
and consume it with a for
loop for example.
Note that this sequence can only be consumed once, and you should make sure that you do not consume it after the buffered reader has been closed.
You can easily get a BufferedReader
from an InputStream
using .bufferedReader()
.
If my assumptions are wrong and you need an InputStream
and not a Reader
, the same principle applies. Write a loop that only reads as much data as you need in each iteration, and yield
the parse result.