The question may be generic but I am trying to understand the major implications here.
I am trying to do some byte code engineering using BCEL library and part of the workflow requires me to read the same byte code file multiple times (from the beginning). The flow is the following
// 1. Get Input Stream
// 2. Do some work
// 3. Finish
// 4. Do some other work.
At step 4, I will need to reset the mark or get the stream as though it's from beginning. I know of the following choices.
1) Wrap the stream using BufferedInputStream
- chance of getting "Resetting to invalid mark" IOException
2) Wrap it using ByteArrayInputStream - it always works even though some online research suggests that it's erroneous?
3) Simply call getInputStream()
if I need to read from the stream again.
I am trying to understand which option would be better for me. I don't want to use BufferedInputStream because I have no clue where the last mark
is called, so calling reset
for a higher mark position will cause IOException. I would prefer using ByteArrayInputStream since it requires the minimum code change for me, but could anyone suggest whether option#2 or option#3 will be better?
I know that implementations for mark() and reset() are different for ByteArrayInputStream
and BufferedInputStream
in JDK.
Regards
The problem of mark
/reset
is not only that you have to know in advance the maximum amount of data being read between these calls, you also have to know whether the code you’re delegating to will use that feature for itself internally, rendering your mark obsolete. It’s impossible for code using mark
/reset
to remember and restore a previous mark for the caller.
So while it would be possible to fix the maximum issue by specifying the total file size as maximum readlimit
, you can never rely on a working mark when passing the InputStream
to an arbitrary library function that does not explicitly document to never use the mark
/reset
feature internally.
Also, a BufferedInputStream
getting a readlimit
matching the total file size would not be more efficient than a ByteArrayInputStream
wrapping an array holding the entire file, as both end up maintaining a buffer of the same size.
The best solution would be to read the entire class file into an array once and directly use the array, e.g. for code under your control or when you have a choice regarding the library (ASM’s ClassReader
supports using a byte array instead of an InputStream
, for example).
If you have to feed an InputStream
to a library function insisting on it, like BCEL, then wrap the byte array into a ByteArrayInputStream
when needed, but create a new ByteArrayInputStream
each time you have to re-parse the class file. Constructing the new ByteArrayInputStream
costs nothing, as it is a lightweight wrapper and is reliable, as it does not depend on the state of an older input stream in any way. You could even have multiple ByteArrayInputStream
instances reading the same array at the same time.
Calling getInputStream()
again would be an option, if you have to deal with really large files for which buffering the entire contents is not an option, however, this is not the case for class files.