javapythonkotlinprocesssubprocess

Kotlin/Java processBuilder efficiency vs python subprocesses


I'm trying to create a program using JVM processes which needs to call a different language program with changing parameters each time, a lot of times.

For example, let's say I need to call a node.js program via my main program (KotlinJvm) 1000 times in 10 seconds.

Now, I'm using the ProcessBuilder class to create a new process so I can get the information back to my main function, but it's not fast enough. It's even slow :/

I researched a bit and found out about the Python subprocess library. I tried to implement the same idea there. In Python 3.9 my implementation worked great! and fast.

1. So I'm asking, what is the difference between Python subprocess and Jvm Process?

2. Is there a way to create a JVM subprocess like Python?

As I read, subprocesses can be created on the JVM too by calling .start() from the same ProcessBuilder, but it's still slow.

Just to clarify, calling just once wouldn't have been a problem. The problem is that I need to call this file 1000 times in 10-20 seconds.

Adding some code here for examples.

Kotlin example - I tested a bit, and waitFor() function takes a long time, and that's my problem.

val processBuilder = ProcessBuilder("node somefile.js")
val process = processBuilder.start()
process.waitFor()

Python example

import subprocess

subprocess.Popen(["node", "somefile.js"])

If these are the same, is there any way to optimize JVM process execution? Any environment changes?


Solution

  • Python Popen function is equivalent to Java ProcessBuilder.start() method.

    In your above example, you compare the time it takes Jvm for subprocess to complete with the time it takes Python for subprocess to start.

    To compare same things, you should compare:

    Jvm

    // Start subprocess
    val processHandle = ProcessBuilder("node", "someFile.js").start()
    // Wait subprocess to terminate
    val returnCode = processHandle.waitFor()
    

    to

    Python

    # Start subprocess
    val processHandle = subprocess.Popen(["node", "someFile.js")
    # Wait subprocess to terminate
    val returnCode = processHandle.wait()
    

    EDIT

    I've run simple test on my laptop, and I've not seen significant differences in performance between Kotlin and Python. I'll put it here as test basis, even if measures are not done "properly" (through JMH for Kotlin), it gives an idea:

    Kotlin

    So, for Kotlin, I've made the following .kts script:

    import java.lang.ProcessBuilder;
    
    fun main() {
        var started : Long = 0
        var completed : Long = 0
    
        for (i in 0 until 1000) {
            
            val start = System.nanoTime()
    
            val process = ProcessBuilder("ls").start()
            
            started += (System.nanoTime() - start)
            
            process.waitFor()
    
            completed += (System.nanoTime() - start)
        }
    
        println("Average time (ms) to start a process: ${started * 1e-9}")
        println("Average time (ms) to complete a started process: ${completed * 1e-9}")
    }
    

    Once loaded in Kotlin REPL 1.4.21 upon jre 10, I've got following output:

    Average time (ms) to start a process: 0.667509729
    Average time (ms) to complete a started process: 5.042644314
    

    Python

    On Python 3.7.9, the following script:

    import subprocess
    from time import perf_counter_ns 
    
    started = 0
    completed = 0
    
    for i in range(0, 1000):
    
        start = perf_counter_ns()
    
        process = subprocess.Popen("ls")
    
        started += (perf_counter_ns() - start)
        
        process.wait()
    
        completed += (perf_counter_ns() - start)
    
    print("Average time (ms) to start a process: ", started * 1e-9)
    print("Average time (ms) to complete a process: ", completed * 1e-9)
    

    outputs:

    Average time (ms) to start a process:  1.620647841
    Average time (ms) to complete a process:  6.208644367000001
    

    So, my current thought is that there should not be any big gap of performance between the two methods once execution context is ready. So, if you notice big differences, maybe the problem arise due to some code or initialisations besides the subprocess stuff.

    At this point, more details (a minimal reproducible example would be the best) are needed to find out a correct answer.