OPA Gatekeeper Rego for Istio Port Name convention in Kubernetes

We check that all our Service objects match Istios port convention (start with http- or named http for example). For this we developed a policy that rejects any ports that don’t match and allow opting out via namespace labels.

package k8svalidistioserviceportname

violation[{"msg": msg}] {
  valid := "^(grpc|http|http2|https|mongo|mysql|redis|tcp|tls|udp)($|-)"
  service := input.review.object
  port := service.spec.ports[_]
  not valid_port(port, valid)

  msg := sprintf(
    "%v %v %v: port name must match %v to be routable by Istio",
    [service.kind, service.metadata.namespace, service.metadata.name, valid]

valid_port(port, valid) {
  re_match(valid, port.name)
package k8svalidistioserviceportname

test_ignores_exact_match {
  count(violation) == 0 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{"name":"https"}]}}}}

test_ignores_prefix_match {
  count(violation) == 0 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{"name":"https-foobar"}]}}}}

test_blocks_bad_match {
  count(violation) == 1 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{"name":"httpsfoobar"}]}}}}

test_blocks_empty {
  count(violation) == 1 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{}]}}}}

test_blocks_multiple_bad {
  count(violation) == 1 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{}, {}]}}}}

Testing Rego With enforced code coverage

A ruby script we use to test our Rego policies. They need to be in the policies/ folder. Each line that is not exercised by tests will make it fail.

desc "Test policies"
task test: ["update:opa"] do
  output = `opa test --coverage --verbose policies/* 2>&1`
  abort output unless $?.success?

  coverage = JSON.parse(output).fetch("files")
  errors = policy_files.flat_map do |policy|
    return [policy] unless result = coverage[policy] # untested

    (result["not_covered"] || []).map do |line|
      start = line.dig("start", "row")
      finish = line.dig("end", "row")
      "#{policy}:#{start}#{"-#{finish}" if start != finish}"
  abort "Missing coverage:\n#{errors.join("\n")}" if errors.any?

Simple Kubernetes Leader Election via Entrypoint script

Leader election in kubernetes is often done via sidecars + Endpoints or leases, which is a lot of complexity comapred toConfigMap based locking (as used by operator-sdk), it also avoids having the leader move around during execution.

kube-leader produces a downloadable binary, that implements leader election via a docker EXTRYPOINT. Add it to your Dockerfile, add kubernetes env vars/permissions and you are done.

Verify Pagerduty reaches On-Call by Cron

We had a few incidents were on-call devs missed their calls because of various spam-blocking setups or “do not disturb” settings.
We now run a small service that test-notifies everyone once a month to make sure notifications go through. Notifications go out shortly before their ‘do not disturb’ stops so we do not wake them in the middle of the night, but still have a realistic situation.
Our setup has more logging/stats etc, but it goes something like this:

# configure user schedule
require 'yaml'
users = YAML.load <<~YAML
- name: "John Doe"
  id: ABCD
#  cron: "* * * * * America/Los_Angeles" # every minute ... for local testing
  cron: "55 6 * * 2#1 America/Los_Angeles" # every first Tuesday of the month at 6:55am
# ... more users here

# code to notify users
require 'json'
require 'faraday'
def create_test_incident(user)
  connection = Faraday.new
  response = nil
  2.times do
    response = connection.post do |req|
      req.url "https://api.pagerduty.com/incidents"
      req.headers['Content-Type'] = 'application/json'
      req.headers['Accept'] = 'application/vnd.pagerduty+json;version=2'
      req.headers['From'] = 'realusers@email.com' # incident owner 
      req.headers['Authorization'] = "Token token=#{ENV.fetch("PAGERDUTY_TOKEN")}"
      req.body = {
        incident: {
          type: "incident",
          title: "Pagerduty Tester: Incident for #{user.fetch("name")}, press resolve",
          service: {
            id: ENV.fetch("SERVICE_ID"),
            type: "service_reference"
          assignments: [{
            assignee: {
              id: user.fetch("id"),
              type: "user_reference"
    if response.status == 429 # pagerduty rate-limits to 6 incidents/min/service
      sleep 60
    raise "Request failed #{response.status} -- #{response.body}" if response.status >= 300

# run on a schedule (no threading / forking)
require 'serial_scheduler'
require 'fugit'
scheduler = SerialScheduler.new
users.each do |user|
  scheduler.add("Notify #{user.fetch("name")}", cron: user.fetch("cron"), timeout: 10) do
    user_id = user.fetch("id")
    incident_id = PagerdutyTester.create_test_incident(user)
    puts "Created incident for #{user_id} https://#{ENV.fetch('SUBDOMAIN')}.pagerduty.com/incidents/#{incident_id}"
  rescue StandardError => e
    puts "Creating incident for #{user_id} failed #{e}"