I just found “Zen and the Art of Motorcycle Maintenance” for free online, a book that was on my wish list for quiet some time (recommended by a good number of programmers…).
And since i like a printed version here is a small script to get it in a printable format.
(which i post here for educational purpose only)
require 'rubygems'
require 'open-uri'
require 'hpricot'
book = 'zen_and_the_art/zen_and_the_art'
pages = 32
text = ''
1.upto(pages) { |i|
doc = Hpricot(open("http://www.esolibris.com/ebooks/#{book}_#{i.to_s.rjust(2,'0')}.php").read)
doc.search('table.body tr td[@height=40] div').remove
doc.search('table.body tr td[@height=40] img').remove
doc.search('table.body tr td[@height=40] p.body').remove
doc.search('table.body tr td[@height=40] p a').remove
part = doc.search('table.body tr td[@height=40]')
text += part.inner_html
}
File.open('out.html','w') {|f|f.puts text}
And a great example of the power of hpricot as well, thanks!
normally i am quiet happy with hpricot, but that i cannot do this:
item = doc.search(‘table.body tr td[@height=40]’)
item.remove(‘a’)
was a little bit frustrating/un-dry…