“Read ruby” (the ruby 1.9 book) as PDF

Open books (like Read Ruby organized by runpaint) are great,
but if there is no pdf version, how to read/print it properly !?

Converting to pdf…

require 'rubygems'
require 'open-uri'
require 'hpricot'

url = 'http://ruby.runpaint.org'

links = Hpricot(open(url).read).search('a').map{|link| link['href'] }
links.reject!{|link| link.include?('#') or link.include?('//') or link.include?('@') }
links -= ['/opensearch', '/toc']
links.unshift('/toc')
links.map!{|link| url+link }

content = links.map{|link| open(link).read }

html = "#{content * "/n/n/n/n"}"

out = 'temp.html'
File.open(out,'w'){|f| f.print html }

`wkhtmltopdf #{out} temp.pdf`

(or download the version from 2010-09-21)

Have fun!

Negative queries with solr in multiple fields

We recently did some negative queries and had a lot of ‘fun’ with solr.
After reading/testing a bit we found a simple rule: negative queries for single words do not work (dont ask me why…), but it can be fixed with an additional *:*

Does not work: title: -xxx / (-title:xxx)

When you are only interested in certain fields, query building gets rather conplex:

  • contains foo and bar -> title:(foo bar) OR description:(foo bar)
  • contains foo or bar -> title:(foo OR bar) OR description:(foo OR bar)
  • does not contain foo or bar-> -title:(foo bar *:*) AND -description(foo bar *:*)

The *:* is killed by acts_as_solr, so the parser needs a little fix too:

# lib/parser_methods.rb:80
# *:xxx -> *:xxx a : b -> a_t:b
query = "(#{query.gsub(/([^\*]) *: */, "\\1_t:")}) #{models}"

(see our branch on github)

Hope this helps someone!

Cached .all(:include=>[:xxx]) on associations

When fetching all associations with includes they are not cached, but could be, since they are still the same records(unlike with :select/:conditions etc)

user = User.first
user.comments.length # hits db
user.comments.length # cached

user = User.first
user.commens.all(:include=>:comenter).length  # hits db
user.commens.all(:include=>:comenter).length  # hits db
user.comments.length # hits db

Cached find all with includes
This can save requests when performing repetitive calls to the same record.

user = User.first
user.comments.load_target_with_includes([:commenter, :tags]).length # hits db
user.comments.load_target_with_includes([:commenter, :tags]).length # cached
user.comments.length # cached

Code

# do not load an association twice, when all we need are includes
# all(:include=>xxx) would always reload the target
class ActiveRecord::Associations::AssociationCollection
  def load_target_with_includes(includes)
    raise if @owner.new_record?

    if loaded?
      @target
    else
      @loaded = true
      @target = all(:include => includes)
    end
  end
end

Big updates block database, use slow_update_all

Sometimes big updates that affect millions of rows kill our database (all queries hang/are blocked).
Therefore we built a simple solution:

class ActiveRecord::Base
  def self.slow_update_all(set, where, options={})
    ids_to_update = find_values(:select => :id, :conditions => where)
    ids_to_update.each_slice(10_000) do |slice|
      update_all(set, :id => slice)
      sleep options[:sleep] if options[:sleep]
    end
    ids_to_update.size
  end
end


This needs ActiveRecord find_values extension