Verify Pagerduty reaches On-Call by Cron

We had a few incidents were on-call devs missed their calls because of various spam-blocking setups or “do not disturb” settings.
We now run a small service that test-notifies everyone once a month to make sure notifications go through. Notifications go out shortly before their ‘do not disturb’ stops so we do not wake them in the middle of the night, but still have a realistic situation.
Our setup has more logging/stats etc, but it goes something like this:

# configure user schedule
require 'yaml'
users = YAML.load <<~YAML
- name: "John Doe"
  id: ABCD
#  cron: "* * * * * America/Los_Angeles" # every minute ... for local testing
  cron: "55 6 * * 2#1 America/Los_Angeles" # every first Tuesday of the month at 6:55am
# ... more users here
YAML

# code to notify users
require 'json'
require 'faraday'
def create_test_incident(user)
  connection = Faraday.new
  response = nil
  2.times do
    response = connection.post do |req|
      req.url "https://api.pagerduty.com/incidents"
      req.headers['Content-Type'] = 'application/json'
      req.headers['Accept'] = 'application/vnd.pagerduty+json;version=2'
      req.headers['From'] = 'realusers@email.com' # incident owner 
      req.headers['Authorization'] = "Token token=#{ENV.fetch("PAGERDUTY_TOKEN")}"
      req.body = {
        incident: {
          type: "incident",
          title: "Pagerduty Tester: Incident for #{user.fetch("name")}, press resolve",
          service: {
            id: ENV.fetch("SERVICE_ID"),
            type: "service_reference"
          },
          assignments: [{
            assignee: {
              id: user.fetch("id"),
              type: "user_reference"
            }
          }]
        }
      }.to_json
    end
    if response.status == 429 # pagerduty rate-limits to 6 incidents/min/service
      sleep 60
      next
    end
    raise "Request failed #{response.status} -- #{response.body}" if response.status >= 300
  end
  JSON.parse(response.body).fetch("incident").fetch("id")
end

# run on a schedule (no threading / forking)
require 'serial_scheduler'
require 'fugit'
scheduler = SerialScheduler.new
users.each do |user|
  scheduler.add("Notify #{user.fetch("name")}", cron: user.fetch("cron"), timeout: 10) do
    user_id = user.fetch("id")
    incident_id = PagerdutyTester.create_test_incident(user)
    puts "Created incident for #{user_id} https://#{ENV.fetch('SUBDOMAIN')}.pagerduty.com/incidents/#{incident_id}"
  rescue StandardError => e
    puts "Creating incident for #{user_id} failed #{e}"
  end
end
scheduler.run

Rails Sum ActiveSupport Instrument Times

We wanted to show the sum of multiple ActiveSupport notifications during a long process, so here is a tiny snipped to do that, an advanced version is used in Samson

# sum activesupport notification duration for given metrics
def time_sum(metrics, &block)
  sum = Hash.new(0.0)
  add = ->(m, s, f, *) { sum[m] += 1000 * (f - s) }
  metrics.inject(block) do |inner, m|
    -> { ActiveSupport::Notifications.subscribed(add, m, &inner) }
  end.call
  sum
end

time_sum(["sql.active_record"]) { 10.times { User.first } }
# {"sql.active_record" => 10.3}

Validating ActiveRecord Backlinks exist

Whenever a new association is added usually we also need the opposite association to ensure things get cleaned up properly during deletion.
To never forget this and audit the current state, these two tests can help.

  def all_models
    models = Dir["app/models/**/*.rb"].grep_v(/\/concerns\//)
    models.size.must_be :>, 20
    models.each { |f| require f }
    ActiveRecord::Base.descendants
  end

  it "explicity defines what should happen to dependencies" do
    bad = all_models.flat_map do |model|
      model.reflect_on_all_associations.map do |association|
        next if association.is_a?(ActiveRecord::Reflection::BelongsToReflection)
        next if association.options.key?(:through)
        next if association.options.key?(:dependent)
        "#{model.name} #{association.name}"
      end
    end.compact
    assert(
      bad.empty?,
      "These associations need a :dependent defined (most likely :destroy or nil)\n#{bad.join("\n")}"
    )
  end

  it "links all dependencies both ways so dependencies get deleted reliably" do
    bad = all_models.flat_map do |model|
      model.reflect_on_all_associations.map do |association|
        next if association.name == :audits
        next if association.options.fetch(:inverse_of, false).nil? # disabled on purpose
        next if association.inverse_of
        "#{model.name} #{association.name}"
      end
    end.compact
    assert(
      bad.empty?,
      <<~TEXT
        These associations need an inverse association.
        For example project has stages and stage has project.
        If automatic connection does not work, use `:inverse_of` option on the association.
        If inverse association is missing AND the inverse should not destroyed when dependency is destroyed, use `inverse_of: nil`.
        #{bad.join("\n")}
      TEXT
    )
  end