Grabbing Zen and the Art of Motorcycle Maintenance

I just found “Zen and the Art of Motorcycle Maintenance” for free online, a book that was on my wish list for quiet some time (recommended by a good number of programmers…).

And since i like a printed version here is a small script to get it in a printable format.
(which i post here for educational purpose only)

require 'rubygems'
require 'open-uri'
require 'hpricot'

book = 'zen_and_the_art/zen_and_the_art'
pages = 32
text = ''
1.upto(pages) { |i|
  doc = Hpricot(open("http://www.esolibris.com/ebooks/#{book}_#{i.to_s.rjust(2,'0')}.php").read)
  doc.search('table.body tr td[@height=40] div').remove
  doc.search('table.body tr td[@height=40] img').remove
  doc.search('table.body tr td[@height=40] p.body').remove
  doc.search('table.body tr td[@height=40] p a').remove
  part = doc.search('table.body tr td[@height=40]')
  text += part.inner_html
}

File.open('out.html','w') {|f|f.puts text}
Advertisements

2 thoughts on “Grabbing Zen and the Art of Motorcycle Maintenance

  1. normally i am quiet happy with hpricot, but that i cannot do this:

    item = doc.search(‘table.body tr td[@height=40]’)
    item.remove(‘a’)

    was a little bit frustrating/un-dry…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s