java dictionary java-stream long-running-processes

Use Java 8 stream's map function for long running tasks

I have a method that takes a List of URLs (of remote files) as parameter that should be downloaded. The method returns a List of other type (called Attachment), that actually holds a property of java File-type inside. For this case I used the Java Stream API to iterate over the URLs and start the download within a "map"-function that actually then returns the Attachment instance.

Now my question: Am I abusing the Java Stream API for things that its not meant for? Like putting long running tasks into it? Should I just do small operations on the input data?

The only minus I see right now is that it is a bit harder to test.

private List<Attachment> download(List<URL> attachments) {
        return attachments.stream().map(attachmentUrl -> {
            try {
                Attachment attachment = new Attachment();
                File attachmentFile = new File(getFilename(attachment.getAttachmentId(), attachmentUrl));
                FileUtils.copyURLToFile(
                        attachmentUrl,
                        attachmentFile,
                        CONNECT_TIMEOUT,
                        READ_TIMEOUT);
                attachment.setAttachmentFile(attachmentFile);
                return attachment;
            } catch (IOException e) {
                e.printStackTrace();
                LOGGER.error(e.getLocalizedMessage());
            }
            return null;
        }).filter(Objects::nonNull).collect(Collectors.toList());
    }

Solution

I think it might be helpful to think about map and other functional constructs (e.g. filter, reduce, etc.) not so much as functions, but rather as syntax. stream().map() is syntax which performs the functional equivalent of a for loop. Asking "am I abusing this syntax because of what I use it to execute?" is then less meaningful: for loops do not care how long the tasks they run on each iteration take, and neither does map. It's agnostic to the operation it's applying, so the only question is are you using the syntax appropriately, i.e. to loop over a collection, mapping from something to something.

In this context, where map is syntax, your desired operations are perfectly alright. However, your implementation could be cleaned up a bit.

attachmentUrl -> {
    try {
        Attachment attachment = new Attachment();
        File attachmentFile = new File(getFilename(attachment.getAttachmentId(), attachmentUrl));
        FileUtils.copyURLToFile(
                attachmentUrl,
                attachmentFile,
                CONNECT_TIMEOUT,
                READ_TIMEOUT);
        attachment.setAttachmentFile(attachmentFile);
        return attachment;
    } catch (IOException e) {
        e.printStackTrace();
        LOGGER.error(e.getLocalizedMessage());
    }
    return null;
}

Is a tad big for an inlined map lambda. In general, I tend to be skeptical, though not always disapproving, of any map lambda that requires curly braces, i.e. that takes up more than one line. I'd suggest refactoring this lambda into a named function, and possibly a couple, which are either nested (map(this::A), where A then calls B) or used serially by your stream operations map(this::A).map(this::B).

[EDIT:] Regarding parallelizing your stream: keep in mind that you're doing more than just CPU processing as part of this method - you seem to be doing network IO and file IO. If you execute in parallel, you'll be parallelizing not just your CPU utilization but also your network and disk usage. If network or disk is the dominant factor, not CPU, then parallelization will likely get you very little, and may make things worse. In general, more threads != faster network or disk read/write. You might find this question on parallel IO useful.