androidbuffermd5-file

How to increase speed of generating md5 of multiple files?


I have 10000 to 12000 image files and having space up to 800 MB present in external storage.

I am using a loop which takes each file path and generates md5 of it, but due to huge amount of files being read to create md5, this takes alot of time.

This is the algorithm for generating md5 of file.

public static String getMd5OfFile(String filePath) {

    String returnVal = "";

    try {

        InputStream input = new FileInputStream(filePath);

        // byte[] buffer  = new byte[1024];
        byte[] buffer = new byte[2048];

        MessageDigest md5Hash = MessageDigest.getInstance("MD5");

        int numRead = 0;
        while (numRead != -1) {
            numRead = input.read(buffer);
            if (numRead > 0) {
                md5Hash.update(buffer, 0, numRead);
            }
        }

        input.close();

        byte[] md5Bytes = md5Hash.digest();

        for (int i = 0; i < md5Bytes.length; i++) {
            returnVal += Integer.toString((md5Bytes[i] & 0xff) + 0x100, 16).substring(1);
        }                
    } catch (Throwable t) {
        t.printStackTrace();
    }

    return returnVal.toUpperCase();
}

So the question is can i increase the buffer size to make operation faster and by how much should i do it, which would not either break the operation or create an issue for generation of md5.

And does wrap the buffer stream in input stream will make it faster?


Solution

  • As with any optimisation problems, you should measure your performance to learn if any of the changes you make have impact.

    2k is certainly a small buffer size and a larger one could do better. But I/O stacks have buffers all the way down, so it might have negligible impact. Try and measure yourself.

    Another optimisation worth trying out is to notice that reading a file is an I/O-bound operation and computing MD5 is CPU-bound. Have one thread read file content and another thread just update MD5 state. Depending on the number of CPU cores on your device, you could hash multiple files in parallel with performance gains.