bashperlsshexpect.pm

Perl Expect spawn limit


For performance issue and duration optimization, i want to know who limits my number of SSH connections.

A BASH script is calling X perl scripts. Each perl scripts spawn a new SSH connection towards a different IP.

So, this is how it works :

max_proc_ssh=400

while read codesite ip operateur hostname
do 
    (sleep 3; /usr/bin/perl $DIR/RTR-sshscript.pl $codesite $ip $operateur $hostname) &
        ((current_proc_ssh++))
    if [ $current_proc_ssh -eq $max_proc_ssh ]; then
        printf "Pausing with $max_proc_ssh processes...\n"
        current_proc_ssh=0
        wait
    fi
done<<<"$temp_info"

And each RTR-sshscript.pl spawns a new Expect with a SSH connection and send a lot of commands, the duration is about 3minutes

$exp->spawn("ssh -o ConnectTimeout=$connectTimeout $user\@$ip") or die ("unable to spawn \n");

So, with max_proc_ssh=200 i have no issue. Scripts are going well. But when i'm going with max_proc_ssh=400, the Expect module cannot handle it. It sometimes tells me **unable to spawn** I would say that, from the 400 expected, only 350 really starts, something like that.

What is wrong with this ? i am trying to define a sublimit to avoid launching 400 expects at the same time, something like :

max_proc_ssh=400
max_sublimit_ssh=200

while read codesite ip operateur hostname
do 
    (sleep 3; /usr/bin/perl $DIR/RTR-sshscript.pl $codesite $ip $operateur $hostname) &
    ((current_proc_ssh++))
    ((current_sublimit_ssh++))
    if [ $current_sublimit_ssh -eq $max_sublimit_ssh ]; then
        printf "Pausing sublimit SSH reached..."
        sleep 3
        current_sublimit_ssh=0
    fi
    if [ $current_proc_ssh -eq $max_proc_ssh ]; then
        printf "Pausing with $max_proc_ssh processes...\n"
        current_proc_ssh=0
        current_sublimit_ssh=0
        wait
    fi
done<<<"$temp_info"

This would allow SSH to launch 200 Expect, then waits 3 secondes before launching 200 again. And then, wait for all 400 to finish before starting again.

EDIT : As described in the comment section, i added "$!" to the error message and then i have this :

./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable

What does that mean ? I am overwhelming the fork limit ? How can I increase it ? By modifying the sysctl.conf file ?

When searching a little by myself, they say check what

sysctl fs.file-nr

is saying But when i start the script, it doesn't go higher than this :

 sysctl fs.file-nr
fs.file-nr = 27904      0       793776

the ulimit for my user is 4096 But when the script starts, the counter goes way higher than this :

 sudo lsof -u restools 2>/dev/null | wc -l
25258

Solution

  • It appears that it's not a process limitation, but the opened-file limitation.

    Adding these line :

    restools         soft    nproc           2048
    restools         soft    nofile          2048
    

    to /etc/security/limits.conf file solved the issue ! The first line limits the number of active process to 2048 And the second, the number of opened files to 2048 Both of them was previously 1024

    Tested, and approved