Below is a script which logs into a remote host (a Cisco IOS-XR router) and runs a single command via SSH. The idea is to grab the result of the command (an integer) so that it can be graphed by Cacti. Cacti runs this script every 5 minutes when it runs it's normal poll routine:
#!/bin/bash
if [[ -z $1 ]]
then
exit 1
fi
HOST="$1"
USER="cact-ssh-user"
TIMEOUT=10s
export SSHPASS="aaaaaaaaaaaaa"
CMD="show controllers np struct IPV4-LEAF-FAST-P np0 | in Entries"
RAW_OUTPUT=$(timeout $TIMEOUT sshpass -e ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $USER@$HOST "$CMD" 2>/dev/null)
GRT_UCASTV4_USED=$(echo -n "$RAW_OUTPUT" | grep "Entries" | awk '{print $6}' | tr -d "," | tr -d " ")
echo -n "ucastv4_used:$GRT_UCASTV4_USED"
This command works fine via an interactive shell (when I run the script on the Cacti server using /path/to/script/script.sh 10.0.0.1
. However when the Cacti cronjob runs the output is simply blank. So on my SSH session to the Cacti server the output is:
$ ./script 10.0.0.1
ucastv4_used:1234
In the Cacti log the output is: 05/22/2017 03:35:21 PM - SPINE: Poller[0] Host[69] TH[1] DS[6837] SCRIPT: /opt/scripts/cacti-scripts/asr9001-get-tcam-ucast-usage.sh 10.0.0.1, output: ucastv4_used:
I have su
'ed to the Cacti user and the script works just fine. So this seems to be specific to it runnings as a cronjob, the ouput from the SSH command is being redirected somewhere magically and I don't know where or why.
To try and debug this I have added the following lines to the script (directly under #!/bin/bash
) and wait for the Cacti 5 minute poll interval to run (I can see in the Cacti log when the script is called every 5 minutes);
exec >/tmp/stdout.log 2>/tmp/stderr.log
set -x
The stdout.log
just contains ucastv4_used:
the same as cacti.log
and the stderr.log
file contains the login banner for the remote SSH host and nothing else. Where has the SSH output gone?
I have tired to change the SSH line in the script to output to a file and then read from there:
timeout $TIMEOUT sshpass -e ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $USER@$HOST "$CMD" > /tmp/output 2>/dev/null
GRT_UCASTV4_USED=$(grep "Entries" /tmp/output | awk '{print $6}' | tr -d "," | tr -d " ")
The file /tmp/output
is empty and so the GRT_UCASTV4_USED
variable is empty also. stdout.log
ends up being the same as before: ucastv4_used:
I have also tried to change #!/bin/bash
to #!/bin/bash -i
to force an interactive session. This kind of works in that with -i
if I add echo $PS1
to the script I can see in the stdout.log
file that $PS1
is set and without -i
it prints nothing. However there is still no output from the SSH command. Where is the command of the SSH output going?
I have also tried to use ssh ..... | tee /tmp/output
so that the output should show up in /tmp/output
and /tmp/stdout.log
but both are blank.
I can see on the remote router that the SSH session is comming in and running the command. This is from debug ssh server
:
RP/0/RSP0/CPU0:May 22 14:52:57.976 UTC: SSHD_[65909]: (open_master_file) command added show controllers np struct IPV4-LEAF-FAST-P np0 | in Entries
Also since this is working via my interactive session with the Cacti server I am guessing the issue is on there and not the router. I am also confident that Cacti it's self is not the problem, I can trigger spine to poll this router host from my interactive SSH session and the script works fine (further pointing to the issue that some how in a non-interactive shell the SSH output is evaporating):
$ cd /usr/local/spine/bin
$ ./spine -V 7 69 69
...
05/22/2017 04:06:56 PM - SPINE: Poller[0] Host[69] TH[1] DS[6837] SCRIPT: /opt/scripts/cacti-scripts/asr9001-get-tcam-ucast-usage.sh 10.0.0.1, output: ucastv4_used:658809
So it seems that the SSH output is being redirected somewhere and I can't "get it" or the router somehow knows this is a non-interactive SSH client and isn't sending anything back. How else can I debug this?
Update 1
Using debug ssh server
on the Cisco router I have captured the debug logs when I am running the script via my interactive SSH session to the Cacti server and when it runs via Cacti's poll interval/cron job. I have diff
'ed the output and the only interesting looking difference I can find (besides stuff like the SSH PID changing and ephemeral source port of the Cacti server changing etc.) is the following:
*** 132,145 ****
(sshd_interactive_shell) *** removing alarm
sshd_interactive_shell - ptyfd = 46
event_contex_init done
! sshd_ptytonet - Channel 1 Received EOT (bytes:1)
! sshd_ptytonet - Channel 1 exec command executed sending CHANNEL_CLOSE
! (close_channel), pid:182260085, sig rcvd:1, state:10 chan_id:1
! addrem_ssh_info_tuple: REMOVE Inside the critical Section %pid:182260085
! Cleanup sshd process 182260085, session id 1, channel_id 1
! addrem_ssh_info_tuple: REMOVE exiting the Critical Section %pid:182260085
close_channel: Accounting stopped: scriptaccount
! In delete channel code, pid:182260085, sig rcvd:1, state:10 chan_id:1
Sending Exit Status: 0 sig: 1
Sending Channel EOF msg
Sending Channel close msg for remote_chan_id = 0 chan_id = 1
--- 134,147 ----
(sshd_interactive_shell) *** removing alarm
sshd_interactive_shell - ptyfd = 46
event_contex_init done
! Pad_len = 6, Packlen = 12
! sshd_nettopty: EOF received. Disconnecting session
! (close_channel), pid:182329717, sig rcvd:1, state:10 chan_id:1
! addrem_ssh_info_tuple: REMOVE Inside the critical Section %pid:182329717
! Cleanup sshd process 182329717, session id 1, channel_id 1
! addrem_ssh_info_tuple: REMOVE exiting the Critical Section %pid:182329717
close_channel: Accounting stopped: scriptaccount
! In delete channel code, pid:182329717, sig rcvd:1, state:10 chan_id:1
Sending Exit Status: 0 sig: 1
Sending Channel EOF msg
Sending Channel close msg for remote_chan_id = 0 chan_id = 1
The top half is my interactive session with the Cacti server. I note in the top hald sshd_ptytonet - Channel 1 Received EOT (bytes:1)
whereas via the cronjob the debug shows sshd_nettopty: EOF received. Disconnecting session
. Is the non-interactive session simply passing my SSH command to the remote host and quiting as quickly as possible (so it's not waiting for the SSH server to respond with the command output)?
RAW_OUTPUT=$(timeout $TIMEOUT sshpass -e ssh -T -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null $USER@$HOST "$CMD" </dev/zero 2>/dev/null)