Monitor Dalli Connection Changes + Failures

Memcached servers changing leads to a tiny split-brain scenario since some servers might read from different caches then the others … good to keep an eye on it and alert when it happens too often. Here is a tiny snippet to report when it happens.

# config/initializers/dalli.rb
# Whenever the alive-ness of a server changes we read keys from a different server
# which leads to stale keys on the old server and cache-misses on the new servers
# so this should not happen often
# see lib/dalli/server.rb
#
# reproduce: rails c + Rails.cache.get + zdi memcached stop & start
Dalli::Server.prepend(Module.new do
  def down!
    $statsd.increment "dalli.connection_changed", tags: ["state:down"] unless @down_at
    super
  end

  def up!
    $statsd.increment "dalli.connection_changed", tags: ["state:up"] if @down_at
    super
  end

  def failure!(*)
    $statsd.increment "dalli.failed"
    super
  end
end)

Fixing MemCache IO timeout for memcache-client

A simple hack to get no more memcache timeouts in production.
You should add some kind of error notification above the ‘nil’ line, to know that memcache is no longer behaving properly.
(If it does not work, check if MemCache.new.cache_get_with_timeout_protection is defined -> load the hack in after_initialize)

code

class MemCache
  def cache_get_with_timeout_protection(*args)
    begin
      cache_get_without_timeout_protection(*args)
    rescue MemCache::MemCacheError => e
      if e.to_s == 'IO timeout' and (Rails.env.production? or Rails.env.staging?)
        nil
      else
        raise e
      end
    end
  end
  alias_method_chain :cache_get, :timeout_protection
end

try it

start script/console
kill -s STOP memcache-pid
try reading from cache in console
kill -s CONT memcache-pid

Finding the oldest element in memcached

We always wanted to know how full memcached is, and therefore know at which age an element is dropped. This hacky script will find it out, by inserting 30 values each day and taking out 30 untouched values from 30 previous days <-> if one is missing, thats how old your oldest element is.
(if you know a better way, let me know 😉 )

Usage
Run one time each day (via cron) and store output into a logfile after 26 days:

rake check_memcached_age
Stats for 2009-12-01:
0: still there...
1: still there...
2: still there...
3: still there...
...
23: still there...
24: still there...
25: 
26: 
...

Your cache is 24 days old!

Script

task :check_memcached_age => :environment do
  cache = ActionController::Base.cache_store
  # insert probes for today
  (0..30).each do |i|
    cache.write "memcached-probe-#{Date.today.to_s(:db)}--#{i}", 'still there...'
  end

  # extract old probes
  results = (0..30).to_a.map do |i|
    [i, cache.read("memcached-probe-#{(Date.today-i.days).to_s(:db)}--#{i}")]
  end

  puts "stats for #{Date.today.to_s(:db)}"
  puts results.map{|day, present| "#{day}: #{present}"} * "\n"  
end