Ruby enumerators (and a script to list all Google search results)

A while ago I carefully crafted a google searcher in sed and shell script. Making scripts for doing http requests is cool! (I say carefully because it is a pain to parse HTML with regular expressions :))

The Code

Nokogiri is a ruby library for searching HTML, but a very simple one. On the start page they have a simple script that searches google. It’s beautiful.

But it doesn’t search the whole results, just the first page. So I modified it a bit:

#!/usr/bin/env ruby

# https://iamstealingideas.wordpress.com/2010/05/23/ruby-enumerators-and-ascript-to-list-all-google-search-results/

require 'open-uri'
require 'nokogiri'

class Search
  include Enumerable

  def initialize(terms)
    def escape u
      URI.escape u, Regexp.new("[^#{URI::PATTERN::UNRESERVED}]")
    end

    @terms = terms.map { |u| escape u }.reduce { |a, b| "#{a}+#{b}" }
  end

  def each
    10.times do |n|
      url = "http://www.google.com/search?start=#{100*n}&num=100&q=#{@terms}"

      doc = Nokogiri::HTML(open(url))
      break if n > doc.css('table#nav a.fl').length
      doc.css('h3.r a.l').each { |p| yield p['href'] }
    end
  end
end

fail "Usage: #{$0} terms" if $*.empty?

Search.new($*).each { |p| puts p }

It was tested in Ruby 1.9.

It’s executable: save it and try ./script site:iamstealingideas.wordpress.com

Objects of Search class has the each method, that receives a block and calls it for all search results. (In this case, the block just prints the url, with puts).

But it also has other methods. Let’s suppose your terminal is being flooded with too much results. (Maybe I’m being too optimistic). If you want just the first 7 terms, you may use the first method. Replace the last line of the script with:

Search.new($*).first(7).each { |p| puts p }

(It may make no difference). What about counting the available results?

puts Search.new($*).count

Enumerable classes

The trick here is to include the Enumerable mixin. This will give a lot of methods to the class, all based on your each method. Just like, say, an Array:

[1, 2, 3].each { |p| puts p }
puts [1, 2, 3].count

In fact, it has all this methods, but it doesn’t have, say, []. You may convert it to the class Array using entries. Replacing the last line with:

puts Search.new($*).entries[723]

You will have the 723th result. Yeah, you might try a keyword with more results, like ./search love.

More

If you have trouble understanding the code, you may want to read about blocks and methods, mixins (more on mixins), enumerators. Or asking in the comments 🙂

IMO, Ruby seems like a nice substitute to shell script. 🙂

Some random things:

  • Another way is to subclass the Enumerator class. Mixins seems to have higher precedence at method lookup, so you may prefer it if you want to implement some of those methods yourself.
  • That escape code was found there.
  • Stopping the search was tricky. I’m counting the number of pages.
  • &safe=off&filter=0 is probably useful. Also see this google help.
  • Map and reduce is a used a lot in functional programming. (Map is also known as collect; reduce, as inject or fold). Do you think this code is readable?
Advertisements

About Elias

Some random geek
This entry was posted in Ruby and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s