ETCD db size based compaction

See Gist for a background job that compacts+defrags etcd when the db grew too big, either because the api server was not doing it’s job or the db grew too quickly.

If this happens often you should consider growing the db size or fixing the api in some other way … also add some reporting so you are aware of the compactions since they can interrupt users queries.

Speeding up kubectl with –raw

time kubectl get pods >/dev/null # 12s
time kubect get --raw /api/v1/pods?resourceVersion=0  >/dev/null # 2s

Just using –raw is already faster, but telling the api that you want to read from cache with resourceVersion=0 speeds it up even more and takes load of the api-server. (be careful that reading from cache and using limit= don’t mix)

(Using –raw is similar to using curl directly, but it has the advantage of looking up the api server for you)

OPA Gatekeeper Rego for Istio Port Name convention in Kubernetes

We check that all our Service objects match Istios port convention (start with http- or named http for example). For this we developed a policy that rejects any ports that don’t match and allow opting out via namespace labels.

package k8svalidistioserviceportname

violation[{"msg": msg}] {
  valid := "^(grpc|http|http2|https|mongo|mysql|redis|tcp|tls|udp)($|-)"
  service := input.review.object
  port := service.spec.ports[_]
  not valid_port(port, valid)

  msg := sprintf(
    "%v %v %v: port name must match %v to be routable by Istio",
    [service.kind, service.metadata.namespace, service.metadata.name, valid]
  )
}

valid_port(port, valid) {
  port.name
  re_match(valid, port.name)
}
package k8svalidistioserviceportname

test_ignores_exact_match {
  count(violation) == 0 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{"name":"https"}]}}}}
}

test_ignores_prefix_match {
  count(violation) == 0 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{"name":"https-foobar"}]}}}}
}

test_blocks_bad_match {
  count(violation) == 1 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{"name":"httpsfoobar"}]}}}}
}

test_blocks_empty {
  count(violation) == 1 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{}]}}}}
}

test_blocks_multiple_bad {
  count(violation) == 1 with input as {"review":{"object":{"kind":"Service","metadata":{"name":"truth-service","namespace":"mesh-enabled"},"spec":{"ports":[{}, {}]}}}}
}

Testing Rego With enforced code coverage

A ruby script we use to test our Rego policies. They need to be in the policies/ folder. Each line that is not exercised by tests will make it fail.

desc "Test policies"
task test: ["update:opa"] do
  output = `opa test --coverage --verbose policies/* 2>&1`
  abort output unless $?.success?

  coverage = JSON.parse(output).fetch("files")
  errors = policy_files.flat_map do |policy|
    return [policy] unless result = coverage[policy] # untested

    (result["not_covered"] || []).map do |line|
      start = line.dig("start", "row")
      finish = line.dig("end", "row")
      "#{policy}:#{start}#{"-#{finish}" if start != finish}"
    end
  end
  abort "Missing coverage:\n#{errors.join("\n")}" if errors.any?
end