rubymonitoringjobssidekiqreliability

What approach can I use to notify myself of jobs that've not run as per their schedule due to any reason? (OOM, etc)


So I have a quite a few workers that execute frequently ranging from daily to hourly, etc. There have been incidents where a few of them just did not execute without any signature or failure. I need to come up with a solution to track these. I thought about having a listener that logs every time a worker starts, but there's just too many workers to keep track of. A better approach would be for me to know when a worker ~did not~ run. That is more important.

I've thought about creating a table where I could add logs for when workers start execution and if the last log for that worker is too long ago (longer than the gap of time it is supposed to have) then it notifies me.


Solution

  • This approach should give you some ideas of how you might use the Sidekiq API to notify perhaps using a slack notifier class, you might put this in a worker and run it on some other schedule, of course if this were to fail because of resources, well that's a compounding problem. But hopefully you have some priorities in your queues.

    class SlackNotifier
      require 'net/http'
      require 'uri'
      require 'openssl'
      attr_reader :params
    
      def initialize(params)
        @params = params
      end
    
      def notify
        return if ENV['SLACK_WEBHOOK'].nil?
        channel = "dev"
        uri = URI.parse ENV['SLACK_WEBHOOK']
        http = Net::HTTP.new(uri.host, uri.port)
        http.verify_mode = OpenSSL::SSL::VERIFY_NONE unless defined?(Rails) && Rails.env.production?
        http.use_ssl = true
        request = Net::HTTP::Post.new(uri.request_uri)
        request.body = "payload={'channel': '#{channel}', 'username': 'webhookbot', 'text': '#{params[:text]}'}"
        http.request(request)
      end
    end
    
    
    long = Sidekiq::Queue.new('long_running')
    whats_taking_so_long = long.select{|j| j.enqueued_at < 8.hours.ago }
    
    whats_taking_so_long.each do |long|
      SlackNotifier.new(text: long.item.to_s).notify
    end