Fixing Varnish restarts after ESI backend dies or timeouts

Varnish died in peak hours, so we investigated and found out that varnish died only when the esi backend died. We reproduced this scenario by hitting our test-server, that had a simple sleep-10 ttl-0 esi action, with ab.

Error
And look what we found in /var/log/syslog …

Child (10211) Panic message: Missing errorhandling code in HSH_Prepare(), cache_hash.c line 188: Condition((p) != 0) not true. thread = (cache-worker)sp = 0x7fed9e312008 { fd = 14, id = 14, xid = 177614113, client = 127.0.0.1:36835, step = STP_LOOKUP, handling = hash, ws = 0x7fed9e312078 { overflow id = “sess”, {s,f,r,e} = {0x7fed9e312808,,+16364,(nil),+16384}, }, worker = 0x47002bc0 { }, vcl = { srcname = { “input”, “Default”, }, }, },

Searching aroud the interwebs suggested that session workspace should be increased, which helped a bit. Decreasing esi timeout (first_byte_timeout, not connection_timeout) and increasing the threads(min or pool, just getting more) also improved the situation.

Semi-Solution

# vcl
backend esi_backend {
  ...
  .first_byte_timeout = 3s;
}

# startup params
-p sess_workspace=524288
-p thread_pools=8 # should be number of cpus
-p thread_pool_min=500

This is for 2.0.4, there are some esi changes/fixes in trunk, so you might want to retest before simply using these settings in a newer version (e.g. sess_workspace is pretty high and will use more ram than necessary) (do not use 2.0.5 with esi)

Solution
Using SSI infront of Varnish, we could keep almost all configuration the same and instantly every problem was solved!
(we could also remove some unnecessary logic from Varnishs VCL, that normally handled adding/removing cookies after/before ESI)

Setting dynamic ttl from varnish headers in vcl

Objective: Turn “6m” into 6*60

set req.http.X-ttl = "60s"
...
call set_ttl
...
#set cached obj.ttl from req.url.X-ttl
sub set_ttl{
  #TODO
}

First version: converting “60s” to 60:

C{
  char *ttl;
  ttl = VRT_GetHdr(sp, HDR_OBJ, "\06X-ttl:"); // 6 == 6 chars
  VRT_l_obj_ttl(sp, atoi(ttl));
}C

Second version: use brute force to covert 60s + 60m + 60h

if( req.http.X-ttl ~ "s$"){ # seconds
  C{
    char *ttl;
    ttl = VRT_GetHdr(sp, HDR_REQ, "\06X-ttl:"); // 6 == 6 chars
    VRT_l_obj_ttl(sp, atoi(ttl));
  }C
} elseif ( req.http.X-ttl ~ "m$") { # minutes
  C{
    char *ttl;
    ttl = VRT_GetHdr(sp, HDR_REQ, "\06X-ttl:"); // 6 == 6 chars
    VRT_l_obj_ttl(sp, atoi(ttl) * 60);
  }C
} elseif ( req.http.X-ttl ~ "h$") { # hours
  C{
    char *ttl;
    ttl = VRT_GetHdr(sp, HDR_REQ, "\06X-ttl:"); // 6 == 6 chars
    VRT_l_obj_ttl(sp, atoi(ttl) * 60 * 60);
  }C
}

Third version: TODO

use TimeUnit from vcc_parse.c
build parser myself
ask for help in #varnish

If all else fails: Study the compiled C code: varnishd -d -f foo.vcl -C

Do not use Varnish 2.0.5 + ESI

This release contains a very evil esi bug that we just found after endless debugging, it is known since 2 weeks (16.11.) and will be fixed in the next release…

Michael Grosser, the Blog

Menu

Tag Archives: Varnish

Fixing Varnish restarts after ESI backend dies or timeouts

Setting dynamic ttl from varnish headers in vcl

Do not use Varnish 2.0.5 + ESI