Listing Unmuted Datadog Alerts

Datadogs UI for alerting monitors also shows muted ones without an option to filter, which leads to overhead / confusion when trying to track down what exactly is down.

So we added an alerts task to  kennel  that lists all unmuted alerts and since when they are alerting, it also shows alerts that have no-data warnings even though Datadog UI shows them as not alerting.

bundle exec rake kennel:alerts TAG=team:my-team
Downloading ... 5.36s
Foo certs will expire soon🔒
Ignored cluster:pod1,server: 39:22:00
Ignored cluster:pod12,server: 31:41:00
Ignored cluster:pod12,server: 31:41:00

Foobar Errors (Retry Limit Exceeded)🔒
Alert cluster:pod2 19:05:16

Finding latest AWS ECR image in all repositories

We have a lot of ECR repos that needed to be taken down, so I ran this little aws-cli + ruby + jq script to sanity check if all the images are old

require 'json'
repos = `aws ecr describe-repositories | jq .repositories[].repositoryName --raw-output`.split("\n")
pushed = do |repo|
  out = `aws ecr describe-images --repository-name #{repo} --output json --query 'sort_by(imageDetails,& imagePushedAt)[*]'`
  print '.'
  next unless image = JSON.parse(out).first

Ruby: Waiting for one of multiple threads to finish

We build a small project that watches multiple metrics until one of them finds something, I found ThreadsWait in the stdlib and it was easy to use it. Also added error re-raising so the threads do not die silently and cleanup.

require 'thwait'

def wait_for_first_block_to_complete(*blocks)
  threads = do |block| do
    rescue StandardError => e
  waiter =*threads)
  value = waiter.next_wait.value
  raise value if value.is_a?(StandardError)

  -> { sleep 5 }, -> { sleep 1 }, -> { sleep 2 }
) # will stop after 1 second


Reading journald kernel logs from inside a kubernetes pod

We wanted a watcher that alerts us when bad kernel things happen and were able to deploy that as a DaemonSet using Kubernetes 🙂

  • Use a Debian base image (for example ruby:2.5-stretch)
  • Run as root user or as user that can read systemd logs like systemd-journal
  • Mount /run/log/journal
      - name: foo
        - name: runlog
          mountPath: /run/log/journal
          readOnly: true
      - name: runlog
          path: /run/log/journal
  • Use systemd-journal to read the logs
    require 'systemd/journal'
    journal =
    journal.filter(syslog_identifier: 'kernel') { |entry| puts entry.message }

Running multiple commands in docker in parallel

Went through foreman/goreman/forego and all of them either did not:
– support not printing the name
– support killing all when one finishes
– support sending signals to all children

But this does:

## Install parallel with `done` support
  curl -sL > /tmp/parallel.tar.bz2 && \
  cd /tmp && tar -xvjf /tmp/parallel.tar.bz2 && cd parallel* && \
  ./configure && make install && rm -rf /tmp/parallel*

# stream output and stop all commands if any of them finish/fail
parallel --no-notice --ungroup --halt 'now,done=1' {1} ::: 'sleep 10' 'sleep 20'