Parallelizing SparkleFormation execution

We generate each templates CloudFormation .json file and check them in to spot subtle diffs when refactoring. We also validate all configs on every PR. Doing this serially takes a lot of time, but it can be parallelized easily, while also avoiding ruby boot overhead.

This took us from ~5 minutes runtime to 30s, enjoy!

desc "Validate templates via cloudformation api"
task :validate do
  each_template do |template|
    execute_sfn_command :validate, template
  end
end

desc "Generates cloudformation json files"
task :generate, [:pattern] do |_t, args|
  pattern = /#{args[:pattern]}/ if args[:pattern]
  previous = Dir["generated/*.json"]

  used = each_template do |template|
    next if pattern && template !~ pattern
    output = execute_sfn_command :print, template

    generated = "generated/#{File.basename(template.sub('.rb', '.json'))}"
    File.write generated, output
    generated
  end

  (previous - used).each { |f| File.unlink(f) } unless pattern
end

...

require 'parallel'
require 'timeout'

def each_template(&block)
  # preload slow requires
  require 'bogo-cli'
  require 'sfn'

  # run in parallel, but isolated to avoid caches from being reused
  options = {in_processes: 10, progress: "Progress", isolation: true}
  Parallel.map(Dir["templates/**/*.rb"], options, &block)
end

private

def execute_sfn_command(command, template, *args)
  Timeout.timeout(20) do
    capture_stdout do
      Sfn::Command.const_get(command.capitalize).new({
        defaults: true, # do not ask questions about parameters
        file: template,
        retry: {type: :flat, interval: 1} # we will run into rate limits, ignore them quickly
      }, args).execute!
    end
  end
rescue StandardError
  # give users context when something failed
  warn "bundle exec sfn #{command} #{args.join(" ")} --defaults -f #{template}"
  raise
end

def capture_stdout
  old = $stdout
  $stdout = StringIO.new
  yield
  $stdout.string
ensure
  $stdout = old
end

Automated lambda code upload to S3 with CloudFormation

Maintaining lambda code directly in CloudFormation only works with zipfile property on nodejs and even there it is limited to 2000 characters. For python and bigger lambdas, we now use this ruby script to generate the s3 object that is set in the CloudFormation template.

require 'aws-sdk-core'

def code_via_s3(file, handler)
  bucket = "my-lambda-staging-area"
  content = File.read(file)
  sha = Digest::MD5.hexdigest(content)
  key = "#{file.gsub('/', '-')}-#{sha}.zip"

  # zip up the content (gzip is not supported)
  # needs to be at bottom of zip to support inline editing
  # and match the handler name
  content = `cd #{File.dirname(file)} && zip --quiet - #{File.basename(file)}`
  raise "Zip failed" unless $?.success?

  # upload to s3 (overwriting it ... checking for existance takes the same time ...)
  c = Aws::S3::Client.new
  begin
    c.put_object(body: content, bucket: bucket, key: key)
  rescue Aws::S3::Errors::NoSuchBucket
    c.create_bucket(bucket: bucket)
    retry
  end

  {
    "Handler" => File.basename(file).sub(/\..*/, '') + '.' + handler,
    "Code" => {"S3Bucket" => bucket, "Key" => key}
  }
end

Stubbing Ruby AWS SDK XML with webmock

Stubbing the XML AWS expects is not easy (expects lists to have member keys) and has lots of repetitive elements like XyzResponse + XyzRequest … so I wanted to share a few useful helpers that make it dry.

(Alternative: use stub_requests, example PR)

  # turn ruby hashes into aws style xml
  def fake_xml(name, body={})
    xml = {"#{name}Result" => body}
      .to_xml(root: "#{name}Response", camelize: true).
      .gsub(/ type="array"/, '')
    loop do
      break unless xml.gsub!(%r{<(\S+)s>\s*<\1>(.*?)</\1>\s*</\1s>}m, "<\\1s><member>\\2</member></\\1s>")
    end
    xml
  end

  def expect_aws_request(method, url, action, response={})
    request = stub_request(method, url).
      with(:body => /Action=#{action}(&|$)/)
    request = if response.is_a?(Exception)
      request.to_raise(response)
    else
      request.to_return(:body => fake_xml(action, response))
    end
    requested << request
    request
  end

  def expect_upload_certificate
    expect_aws_request(
      :post, "https://iam.amazonaws.com/",
      "UploadServerCertificate",
      {server_certificate_metadata: {arn: 'FAKE-ARN'}}
    )
  end

  after { requested.each { |r| assert_requested r } }

  it "uploads a cert" do
    expect_upload_certificate
    manager.upload.must_equal 'FAKE-ARN'
  end

Improving sparkle_formation method_missing

Sparkle formation has the habit of swallowing all typos, which makes debugging hard:

foo typo
dynanic! :bar
# ... builds
{
  "typo": {},
  "foo": "<#SparkleFormation::Struct",
  "dymanic!": {"bar": {}}
}

let’s make these fail:

  • no arguments or block
  • looks like a method (start with _ or end with !)
# calling methods without arguments or blocks smells like a method missing
::SparkleFormation::SparkleStruct.prepend(Module.new do
   def method_missing(name, *args, &block)
     caller = ::Kernel.caller.first

     called_without_args = (
       args.empty? &&
       !block &&
       caller.start_with?(File.dirname(File.dirname(__FILE__))) &&
       !caller.include?("vendor/bundle")
     )
     internal_method = (name =~ /^_|\!$/)

     if called_without_args || internal_method
       message = "undefined local variable or method `#{name}` (use block helpers if this was not a typo)"
       ::Kernel.raise NameError, message
     end
     super
   end
end)

Private gem leak / attack tester

A script to run on CI to make sure that:

  • no private gems are accidentally listed on rubygems.org (rake release happily does that for you)
  • nobody is trying to attack your private gems by releasing similar named ones

This is written for secure https://github.com/geminabox/geminabox via https://github.com/zendesk/geminastrongbox and might need to be modified to fit other gem servers.

#!/usr/bin/env ruby
def sh(command)
  result = `#{command}`
  raise "FAILED #{result}" unless $?.success?
  result
end

key = ENV.fetch('PRIVATE_SERVER_KEY')
host = ENV.fetch('PRIVATE_SERVER_HOST')
private_gem_names = sh("curl -fs 'https://#{key}@#{host}/gems'")
private_gem_names = private_gem_names.scan(%r{"#{host}/gems/gems/([^"]+)"}).flatten
puts "Found #{private_gem_names.size} private gems"
puts private_gem_names.join(", ")

exposed = sh("curl -fs 'https://rubygems.org/api/v1/dependencies?gems=#{private_gem_names.join(",")}'")
exposed = Marshal.load(exposed).map { |d| d[:name] }.uniq
puts "Found #{exposed.size} of them on rubygems.org"
puts exposed.join(", ")

if exposed.sort == ["LIST KNOW DUPLICATE HERE"].sort
  puts "All good!"
else
  raise "Hacked private gems !?: #{exposed.join(", ")}"
end