javajava-streamlazy-evaluationtry-with-resourcesresource-leak

How do I avoid using try-with-resources or close() in this example?


I am trying to stream data from the internet using java.util.stream.Stream. I have an implementation that works. Here is it below.

final URL url =
   new
      URI
      (
         "INSERT YOUR HYPERLINK HERE. Mine is this 5.2 gb file --> https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv"
      )
      .toURL()
      ;

try
(
   final InputStream inputStream = url.openStream();
   final InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
   final BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
)
{
   final Stream<String> lines = bufferedReader.lines();
   //do what you want with the stream here.    
}
catch (final Exception exception)
{
   throw new RuntimeException(exception);
}

Because it is a Stream, the execution is lazy. It only fetches the minimum necessary from the upstream source (the internet). This is ideal because I can short-circuit and save myself from downloading more from the internet than needed.

The problem though is all of the try-with-resources. I am not opposed to writing them, but I plan to use this implementation as part of a team. So, a lot of people would be using it. To avoid errors, I want to remove at least one particular tripping hazard -- the need for a try-with-resources.

I would like to avoid running into memory/resouce issues because someone forgot to do a try-with-resources. Or a close() method, since that would be the same problem, but even easier to forget.

How would you all work around this problem?


Here is my progress thus far.

I saw this talk by Venkat Subremaniam, in which he described the "Execute-Around Method Pattern" --> https://www.youtube.com/watch?v=yTuwi--LFsM&t=7920s

Long story short, the "Execute Around Method Pattern" allows you to dodge the concern for resource misutilization by simply not letting your callers have the resource. Instead, you let the caller pass in a java.util.function.Consumer whose parameterize type is the resource you are not letting them have, and then simply let them specify in the Consumer the functions they would have called with the resource, if you would have let them have one.

On the one hand, this pattern works perfectly for my problem. The Stream does all the things the Consumer tells it to, and then it closes itself once the Consumer completes. Here is a complete, runnable implementation that I made.

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URI;
import java.net.URL;
import java.util.Comparator;
import java.util.Optional;
import java.util.function.Consumer;
import java.util.function.Function;
import java.util.stream.Stream;

public class StreamDataFromTheInternet
{
   public static void main(final String[] args) throws Exception
   {
      StreamDataFromTheInternet
         .executeAroundMethodPattern
         (
            "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv" //THIS IS A 5 GIGABYTE FILE
            ,
            lines ->
               lines
                  .filter(line -> line.length() > 20) //filter down to lines that are >20 characters
                  .limit(10)                          //of those lines, only grab the first 10
                  .forEach(System.out::println)       //print them out to console
         )
         ;
      //Doing it this way allowed us to avoid downloading the entire 5GB file.
      //Instead, we downloaded only a couple of kb.
      //Basically, we only downloaded as many lines as we needed to in order
      //to complete the Stream.
   }

   private static void executeAroundMethodPattern(final String hyperlink, Consumer<Stream<String>> functionToExecute) throws Exception
   {
      final URL url =
         new
            URI
            (
               hyperlink
            )
            .toURL()
            ;
   
      try
      (
         final InputStream inputStream = url.openStream();
         final InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
         final BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
      )
      {
         final Stream<String> lines = bufferedReader.lines();
         functionToExecute.accept(lines);
      }
      catch (final Exception exception)
      {
         throw new RuntimeException(exception);
      }
   }
}

So, this looks like exactly what I am looking for... assuming that I am only doing code that involves side-effects.

But really, most instances of Stream are supposed to produce a value. Doing a side-effect is really the secondary use-case of a Stream. I want to provide the primary use-case, but with all the safety that I am giving the secondary use-case.

I'll spare you the long rabbit hole and basically boil it down to saying -- returning a value is easy enough, but I couldn't find a way to stop the user from just returning the Stream itself, nullifying the entire point of what I am trying to do here.

Any ideas of how to work around it?

EDIT - I see that I was unclear with the last paragraph. Apologies for that, I was certainly not descriptive enough.

Yes, I obviously could swap my Consumer for a Function, but a problem very quickly arises with only a little testing --- what happens if the user returns the Stream itself?

Long story short, the Stream will blow up in their face if it attempts to fetch any more data from the internet because the try-with-resources block closed the readers. You cannot fetch data from a reader that is closed, and if you attempt to do so, you will get an Exception.

So, no. Simply swapping a Consumer for a Function will not solve my problem here. Ultimately, all it does is swap one problem for another. Instead of a resource leak, we now have a Stream that could blow up in your face for using it in an otherwise innocent way. Yes, code reviews and the like could prevent this, but if I could depend on code reviews from solving this problem, I would not have asked it in the first place. The team I am intending this code to be used for simply contains too many juniors producing so much code, that the reviewers are overburdened. I cannot depend on reviews, and I would rather not release the feature than try to depend on reviews.

I want to create a guard rail, not a warning sign. And the suggested solution to simply return an unusable stream is like having a guard rail with a gap in the middle of it -- it looks like it protects you unless you happen to crash near the gap. More specifically, it is like having a gap in our guard rail, and a warning sign warning you about he gap in the guard rail lol. In that respect, it is the worst of both worlds.

And finally, let me say that I want a durable guard rail, or no road. If what I ask is impossible, and you can demonstrate that, by all means, I don't need to have the road. This function I am providing is a convenience to help people write better code. It enables a safe abstraction. But an abstraction is only as good as it is airtight. The second it starts to leak (warning sign vs guard rail), it's usefulness drops exponentially. I am not saying that I need an abstraction that can model my entire context, I just need it to model a single use-case, and accomplish that with existing semantics of a Stream. If that cannot be maintained, then I would rather abandon the abstraction (Stream) entirely.


Solution

  • You wrote:

    returning a value is easy enough, but I couldn't find a way to stop the user from just returning the Stream itself, nullifying the entire point of what I am trying to do here.

    There is no other way to stop developers from returning the Stream than documenting that this won’t work. To show an example from the experts:

    StackWalker.walk(Function)

    public <T> T walk(Function<? super Stream<StackWalker.StackFrame>,? extends T> function)

    Applies the given function to the stream of StackFrames for the current thread, traversing from the top frame of the stack, which is the method calling this walk method.

    The StackFrame stream will be closed when this method returns. When a closed Stream<StackFrame> object is reused, IllegalStateException will be thrown.

    So the JDK developers did not find an alternative to just documenting the restriction. But it seems, they do not share your opinion about “juniors”, to end up omitting a feature just because it could be used in the wrong way. Otherwise, since there is no “airtight” software in real life, you would end up with no software at all.

    A solution that fails immediately on the first test when used incorrectly is not “nullifying the entire point” of your solution, as your original point was “To avoid errors, I want to remove at least one particular tripping hazard -- the need for a try-with-resources”. This goal is still achieved and it’s striking that your original idea explicitly mentioned removing one tripping hazard, not all of them.

    Even with a Consumer, you can’t prevent callers from providing a consumer storing the Stream in a heap variable, to attempt to use it later, or forgetting the filter when using a non-short-circuiting terminal operation.

    What will happen when you deny your “juniors” this method because it is not “airtight”? They will have to implement this functionality themselves, including those potential errors your method did already avoid. Like just doing the tempting one-liner new BufferedReader(new InputStreamReader(url.openStream())) .lines() .forEach(…)

    If you fear “that the reviewers are overburdened” with a problem that would be detected on the first test, you probably should change your processes, to perform unit tests first and only perform code review by humans if the code passed the automated tests.