Listing Unmuted Datadog Alerts

Datadogs UI for alerting monitors also shows muted ones without an option to filter, which leads to overhead / confusion when trying to track down what exactly is down.

So we added an alerts task to  kennel  that lists all unmuted alerts and since when they are alerting, it also shows alerts that have no-data warnings even though Datadog UI shows them as not alerting.

bundle exec rake kennel:alerts TAG=team:my-team
Downloading ... 5.36s
Foo certs will expire soon🔒
https://app.datadoghq.com/monitors/123
Ignored cluster:pod1,server:10.215.225.122 39:22:00
Ignored cluster:pod12,server:10.218.176.123 31:41:00
Ignored cluster:pod12,server:10.218.176.123 31:41:00


Foobar Errors (Retry Limit Exceeded)🔒
https://app.datadoghq.com/monitors/1234
Alert cluster:pod2 19:05:16

Ruby: Waiting for one of multiple threads to finish

We build a small project that watches multiple metrics until one of them finds something, I found ThreadsWait in the stdlib and it was easy to use it. Also added error re-raising so the threads do not die silently and cleanup.

require 'thwait'

def wait_for_first_block_to_complete(*blocks)
  threads = blocks.map do |block|
    Thread.new do
      block.call
    rescue StandardError => e
      e
    end
  end
  waiter = ThreadsWait.new(*threads)
  value = waiter.next_wait.value
  threads.each(&:kill)
  raise value if value.is_a?(StandardError)
  value
end

wait_for_first_block_to_complete(
  -> { sleep 5 }, -> { sleep 1 }, -> { sleep 2 }
) # will stop after 1 second

 

Reading journald kernel logs from inside a kubernetes pod

We wanted a watcher that alerts us when bad kernel things happen and were able to deploy that as a DaemonSet using Kubernetes 🙂

  • Use a Debian base image (for example ruby:2.5-stretch)
  • Run as root user or as user that can read systemd logs like systemd-journal
  • Mount /run/log/journal
    spec:
      containers:
      - name: foo
        ...
        volumeMounts:
        - name: runlog
          mountPath: /run/log/journal
          readOnly: true
      volumes:
      - name: runlog
        hostPath:
          path: /run/log/journal
  • Use systemd-journal to read the logs
    require 'systemd/journal'
    journal = Systemd::Journal.new
    journal.seek(:tail)
    journal.move_previous
    journal.filter(syslog_identifier: 'kernel')
    journal.watch { |entry| puts entry.message }

Chef install google-cloud-sdk without package manager

A tiny chef snipped to install gcloud without using a package manager (to get the latest version without waiting)

gcloud_version = node["foo"]["google-cloud-sdk_version"]
gcloud_file = "google-cloud-sdk-#{gcloud_version}-linux-x86_64.tar.gz"
gcloud_folder = "https://dl.google.com/dl/cloudsdk/channels/rapid/downloads"
gcloud_installer = "https://dl.google.com/dl/cloudsdk/channels/rapid/install_google_cloud_sdk.bash"
execute "gcloud_install" do
  # clean up ... download installer but select the version we want ... install ... link
  command "rm -rf $CLOUDSDK_INSTALL_DIR/google-cloud-sdk && curl #{gcloud_installer} | sed 's;__SDK_URL_DIR=.*;__SDK_URL_DIR=#{gcloud_folder};' | sed 's/__SDK_TGZ=.*/__SDK_TGZ=#{gcloud_file}/' | bash && ln -sf $CLOUDSDK_INSTALL_DIR/google-cloud-sdk/bin/gcloud /usr/local/bin/gcloud"
  env(
    "CLOUDSDK_CORE_DISABLE_PROMPTS" => "true",
    "CLOUDSDK_INSTALL_DIR" => "/opt", # prefix
  )
  not_if { `true && gcloud -v`.include?(gcloud_version) } # ~FC048
end