Checksum for a whole folder in Ruby

Generates a checksum for a given folder without considering updated_at/created_at/permissions, just the content.

def self.checksum(dir)
  files = Dir["#{dir}/**/*"].reject{|f| File.directory?(f)}
  content = files.map{|f| File.read(f)}.join
  require 'md5'
  MD5.md5(content).to_s
end

Simpler but with modification/user-rights etc:

Unix

tar cf - /dir | md5sum

Mac

tar cf - /dir | md5

Fixing MemCache IO timeout for memcache-client

A simple hack to get no more memcache timeouts in production.
You should add some kind of error notification above the ‘nil’ line, to know that memcache is no longer behaving properly.
(If it does not work, check if MemCache.new.cache_get_with_timeout_protection is defined -> load the hack in after_initialize)

code

class MemCache
  def cache_get_with_timeout_protection(*args)
    begin
      cache_get_without_timeout_protection(*args)
    rescue MemCache::MemCacheError => e
      if e.to_s == 'IO timeout' and (Rails.env.production? or Rails.env.staging?)
        nil
      else
        raise e
      end
    end
  end
  alias_method_chain :cache_get, :timeout_protection
end

try it

start script/console
kill -s STOP memcache-pid
try reading from cache in console
kill -s CONT memcache-pid

validates_uniqness_of + mysql == SLOW

ActiveRecord`s validate_uniqueness_of produces evil SQL that will not use your existing index!

Before:
SELECT `users`.id FROM `users` WHERE `users`.`email` = BINARY ‘my@email.com’ AND `users`.id 1234) LIMIT 1; –> 0.80s

After:
SELECT `users`.id FROM `users` WHERE `users`.`email` = ‘my@email.com’ AND `users`.id 1234) LIMIT 1; –> 0.00s

Hack to make AR use faster queries on the cost that no case-sensitive queries can be made anymore.

# validates_uniqueness_of produces "column = BINARY 'text'" queries
# which will not use existing indices, so we add this 
# EVIL HACK to make 
# ALL validates_uniqueness_of in-case-sensitive
class ActiveRecord::ConnectionAdapters::MysqlAdapter
  def case_sensitive_equality_operator
    "="
  end
end

Fixing Varnish restarts after ESI backend dies or timeouts

Varnish died in peak hours, so we investigated and found out that varnish died only when the esi backend died. We reproduced this scenario by hitting our test-server, that had a simple sleep-10 ttl-0 esi action, with ab.

Error
And look what we found in /var/log/syslog …

Child (10211) Panic message: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188: Condition((p) != 0) not true. thread = (cache-worker)sp = 0x7fed9e312008 { fd = 14, id = 14, xid = 177614113, client = 127.0.0.1:36835, step = STP_LOOKUP, handling = hash, ws = 0x7fed9e312078 { overflow id = “sess”, {s,f,r,e} = {0x7fed9e312808,,+16364,(nil),+16384}, }, worker = 0x47002bc0 { }, vcl = { srcname = { “input”, “Default”, }, }, },

Searching aroud the interwebs suggested that session workspace should be increased, which helped a bit. Decreasing esi timeout (first_byte_timeout, not connection_timeout) and increasing the threads(min or pool, just getting more) also improved the situation.

Semi-Solution

# vcl
backend esi_backend {
  ...
  .first_byte_timeout = 3s;
}

# startup params
-p sess_workspace=524288
-p thread_pools=8 # should be number of cpus
-p thread_pool_min=500

This is for 2.0.4, there are some esi changes/fixes in trunk, so you might want to retest before simply using these settings in a newer version (e.g. sess_workspace is pretty high and will use more ram than necessary) (do not use 2.0.5 with esi)

Solution
Using SSI infront of Varnish, we could keep almost all configuration the same and instantly every problem was solved!
(we could also remove some unnecessary logic from Varnishs VCL, that normally handled adding/removing cookies after/before ESI)

Setting dynamic ttl from varnish headers in vcl

Objective: Turn “6m” into 6*60

set req.http.X-ttl = "60s"
...
call set_ttl
...
#set cached obj.ttl from req.url.X-ttl
sub set_ttl{
  #TODO
}

First version: converting “60s” to 60:

C{
  char *ttl;
  ttl = VRT_GetHdr(sp, HDR_OBJ, "\06X-ttl:"); // 6 == 6 chars
  VRT_l_obj_ttl(sp, atoi(ttl));
}C

Second version: use brute force to covert 60s + 60m + 60h

if( req.http.X-ttl ~ "s$"){ # seconds
  C{
    char *ttl;
    ttl = VRT_GetHdr(sp, HDR_REQ, "\06X-ttl:"); // 6 == 6 chars
    VRT_l_obj_ttl(sp, atoi(ttl));
  }C
} elseif ( req.http.X-ttl ~ "m$") { # minutes
  C{
    char *ttl;
    ttl = VRT_GetHdr(sp, HDR_REQ, "\06X-ttl:"); // 6 == 6 chars
    VRT_l_obj_ttl(sp, atoi(ttl) * 60);
  }C
} elseif ( req.http.X-ttl ~ "h$") { # hours
  C{
    char *ttl;
    ttl = VRT_GetHdr(sp, HDR_REQ, "\06X-ttl:"); // 6 == 6 chars
    VRT_l_obj_ttl(sp, atoi(ttl) * 60 * 60);
  }C
}

Third version: TODO

  • use TimeUnit from vcc_parse.c
  • build parser myself
  • ask for help in #varnish

If all else fails: Study the compiled C code: varnishd -d -f foo.vcl -C