azureamazon-ec2salt-projectsalt-cloud

Online Saltstack Minions on Azure losses connection with Master on DigitalOcean


I have a salt 2016.11.3 (Carbon) playground with a master in DigitalOcean and 4 minions in Azure (three ubuntu and 1 windows).

After a while ubuntu minions are not responding to salt -t 30 '*' test.ping but they are online ( I can ssh into them )

Restarting the master systemctl restart salt-master or minions systemctl restart salt-minion seems to bring minions back for a while.

Things checked:

Also after restart I get a double response from re-added nodes but I think this is a cache problem because it disappears after some time (cache invalidation).


Solution

  • It seems like is a communication error. There is an older 2013 bug report on saltstack github repo and someone states in comments that AWS and Azure load balancers don't respect TCP keepalives.

    Suggested solutions:

    1. add a cron to ping minions each minute
    2. change some keepalive settings in Azure minions config file

    Until now solution #2 works for me.

    tcp_keepalive: True
    tcp_keepalive_idle: 60