For performance issue and duration optimization, i want to know who limits my number of SSH connections.
A BASH script is calling X perl scripts. Each perl scripts spawn a new SSH connection towards a different IP.
So, this is how it works :
max_proc_ssh=400
while read codesite ip operateur hostname
do
(sleep 3; /usr/bin/perl $DIR/RTR-sshscript.pl $codesite $ip $operateur $hostname) &
((current_proc_ssh++))
if [ $current_proc_ssh -eq $max_proc_ssh ]; then
printf "Pausing with $max_proc_ssh processes...\n"
current_proc_ssh=0
wait
fi
done<<<"$temp_info"
And each RTR-sshscript.pl spawns a new Expect with a SSH connection and send a lot of commands, the duration is about 3minutes
$exp->spawn("ssh -o ConnectTimeout=$connectTimeout $user\@$ip") or die ("unable to spawn \n");
So, with max_proc_ssh=200 i have no issue. Scripts are going well.
But when i'm going with max_proc_ssh=400, the Expect module cannot handle it. It sometimes tells me **unable to spawn**
I would say that, from the 400 expected, only 350 really starts, something like that.
What is wrong with this ? i am trying to define a sublimit to avoid launching 400 expects at the same time, something like :
max_proc_ssh=400
max_sublimit_ssh=200
while read codesite ip operateur hostname
do
(sleep 3; /usr/bin/perl $DIR/RTR-sshscript.pl $codesite $ip $operateur $hostname) &
((current_proc_ssh++))
((current_sublimit_ssh++))
if [ $current_sublimit_ssh -eq $max_sublimit_ssh ]; then
printf "Pausing sublimit SSH reached..."
sleep 3
current_sublimit_ssh=0
fi
if [ $current_proc_ssh -eq $max_proc_ssh ]; then
printf "Pausing with $max_proc_ssh processes...\n"
current_proc_ssh=0
current_sublimit_ssh=0
wait
fi
done<<<"$temp_info"
This would allow SSH to launch 200 Expect, then waits 3 secondes before launching 200 again. And then, wait for all 400 to finish before starting again.
EDIT : As described in the comment section, i added "$!" to the error message and then i have this :
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
What does that mean ? I am overwhelming the fork limit ? How can I increase it ? By modifying the sysctl.conf file ?
When searching a little by myself, they say check what
sysctl fs.file-nr
is saying But when i start the script, it doesn't go higher than this :
sysctl fs.file-nr
fs.file-nr = 27904 0 793776
the ulimit for my user is 4096
But when the script starts, the counter goes way higher than this :
sudo lsof -u restools 2>/dev/null | wc -l
25258
It appears that it's not a process limitation, but the opened-file limitation.
Adding these line :
restools soft nproc 2048
restools soft nofile 2048
to /etc/security/limits.conf
file solved the issue !
The first line limits the number of active process to 2048
And the second, the number of opened files to 2048
Both of them was previously 1024
Tested, and approved