javaftpjava-streamapache-commons-net

Downloading many FTP files in parallel using Apache FTPClient and Java streams fails with "socket write error" or "Could not parse response code"


How to retrieve file names from FTP from different directories - seems like FTP close connection, if it detects flood request

The code is very simple

public List<String> getPaths(String path, LocalDate date)  {
    try {
        val listFiles = ftp.listFiles(path);
        return Arrays.stream(listFiles)
                .parallel()
                .filter(f->f.getTimestamp().getTime().toInstant().isAfter(date.atStartOfDay(ZoneId.systemDefault()).toInstant()))
                .map(FTPFile::getName)
                .collect(Collectors.toList());
    } catch (IOException e) {
        e.printStackTrace();
        return Collections.emptyList();
    }
}

and

public List<String> getPurchaseList(LocalDate date) {
    try (FTPClientWrapper wrapper = new FTPClientWrapper(
            host, port, login, password)

    ) {
        wrapper.connect();
        val regionList = wrapper.readFile(dir, fileName);
        val dirList = regionList.stream()
                .flatMap(x -> purchaseTypes.stream().map(y -> String.format("%s/%s/%s/%s", dir, x, y, folder)))
                .collect(Collectors.toList());

        return dirList.stream().flatMap(d -> wrapper.getPaths(d, date).stream())
                .collect(Collectors.toList());
    } catch (IOException e) {
        e.printStackTrace();
        return Collections.singletonList(e.getMessage());
    }
}

If I call getPurchaseList if fails with:

SocketException: Connection reset by peer: socket write error

If I use parallelStream at regionList.stream it fails with:

org.apache.commons.net.MalformedServerReplyException: Could not parse response code.
Server Reply: e227 Entering Passive Mode (95,167,245,94,117,237)

Is there any way to parse 1000+ directories of FTP or is it impossible?

I use Apache Commons Net FTP client.


Solution

  • You cannot download multiple files in parallel over one connection with FTP protocol. In general, you cannot use one connection in parallel from multiple threads for any operation whatsoever.

    You have to open separate connection for each thread. In your case, you will want to implement a connection pool.