When grabbing pages, your user-agent could compromise your efforts of ‘not getting sued‘, its easy to see that the so called
‘WWW-Mechanize/0.8.5 (http://rubyforge.org/projects/mechanize/)’ browser (mechanize default user agent) may not be ‘browsing’…
So in order to stay undetected, we change our user_agent…
require 'activesupport' #for the rand part...
class MyGrabber
def grab
doc = Hpricot(agent.get('www.lawyer-rich-company.com').body)
puts doc.search('#secret_info span').inner_html
end
def agent
return @agent if @agent
@agent = WWW::Mechanize.new
@agent.user_agent_alias = WWW::Mechanize::AGENT_ALIASES.keys.rand
#login or what not...
@agent
end
end