my recent reads..

Writing simple ruby utilities for Google IMAP + OAuth 2.0


(blogarhythm ~ Unpretty/Fanmail: TLC)

There are some good ruby gems available for dealing with OAuth 2.0 and talking to Google APIs, for example:

  • google-api-client is the official Google API Ruby Client makes it trivial to discover and access supported APIs.
  • oauth2-client provides generic OAuth 2.0 support that works not just with Google
  • gmail_xoauth implements XAUTH2 for use with Ruby Net::IMAP and Net::SMTP
  • gmail provides a rich Ruby-esque interface to GMail but you need to pair it with gmail_xoauth for OAuth 2 support (also seems that it's in need of a new release to merge in various updates and extensions people have been working on)

For the task I had at hand, I just wanted something simple: connect to a mailbox, look for certain messages, download and do something with the attachments and exit. It was going to be a simple utility to put on a cron job.

No big deal. The first version simple used gmail_xoauth to enable OAuth 2.0 support for IMAP, and I added some supporting routines to handle access_token refreshing.

It worked fine as a quick and dirty solution, but had a few code smells. Firstly, too much plumbing code. But most heinously - you might seen this yourself if you've done any client utilities with OAuth - it used the widely-recommended oauth2.py Python script to orchestrate the initial authorization. For a ruby tool!

Enter the GmailCli gem

So I refactored the plumbing into a new gem called gmail_cli and it is intended for one thing: a super-simple way to whip up utilities that talk to Google IMAP and providing all the OAuth 2.0 support you need. It actually uses google-api-client and gmail_xoauth under the covers for the heavy lifting, but wraps them up in a neat package with the simplest interface possible. Feel free to go use and fork it!

With gmail_cli in your project, there are just 3 things to do:

  1. If you haven't already, create your API project credentials in the Google APIs console (on the "API Access" tab)
  2. Use the built-in rake task or command-line to do the initial authorization. You would normally need to do this only once for each deployment:
    $ rake gmail_cli:authorize client_id='id' client_secret='secret'
    $ gmail_cli authorize --client_id 'id' --client_secret 'secret'
  3. Use the access and refresh tokens generated in step 2 to get an IMAP connection in your code. This interface takes care of refreshing the access token for you as required each time you use it:
    # how you store or set the credentials Hash is up to you, but it should have the following keys:
    credentials = {
    client_id: 'xxxx',
    client_secret: 'yyyy',
    access_token: 'aaaa',
    refresh_token: 'rrrr',
    username: 'name@gmail.com'
    }
    imap = GmailCli.imap_connection(credentials)

A Better Way?

Polling a mailbox is a terrible thing to have to do, but sometimes network restrictions or the architecture of your solution makes it the best viable option. Much better is to be reactive to mail that gets pushed to you as it is delivered.

I've written before about Mandrill, which is the transactional email service from the same folks who do MailChimp. I kinda love it;-) It is perfect if you want to get inbound mail pushed to your application instead of polling for it. And if you run Rails, I really would encourage you to checkout the mandrill-rails gem - it adds Mandrill inbound mail processing to my Rails apps with just a couple of lines of code.

read more and comment..

Ruby Tuesday

(blogarhythm ~ Ruby - Kaiser Chiefs)
@a_matsuda convinced us to dive into Ruby 2.0 at RedDotRubyConf, so I guess this must be the perfect day of the week for it!

Ruby 2.0.0 is currently at p195, and we heard at the conference how stable and compatible it is.

One change we learned that may catch us if we do much multilingual work that's not already unicode is the change that Ruby now assumes UTF-8 encoding for source files. So the special "encoding: utf-8" marker becomes redundant, but if we don't include it the behaviour in 2.0.0 can differ from earlier versions:

$ cat encoding_binary.rb 
s = "\xE3\x81\x82"
p str: s, size: s.size
$ ruby -v encoding_binary.rb
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin11.4.2]
{:str=>"あ", :size=>1}
$ ruby -v encoding_binary.rb
ruby 1.9.3p429 (2013-05-15 revision 40747) [x86_64-darwin11.4.2]
{:str=>"\xE3\x81\x82", :size=>3}

Quickstart on MacOSX with RVM

I use rvm to help manage various Ruby installs on my Mac, and trying out new releases is exactly the time you want it's assistance to prevent screwing up your machine. There were only two main things I needed to take care of to get Ruby 2 installed and running smoothly:
  1. Update rvm so it knows about the latest Ruby releases
  2. Update my OpenSSL installation (it seems 1.0.1e is required although I haven't found that specifically documented anywhere)
Here's a rundown of the procedure I used in case it helps (note, I am running MacOSX 10.7.5 with Xcode 4.6.2). First I updated rvm and attempted to install 2.0.0:
$ rvm get stable
# => updated ok
$ rvm install ruby-2.0.0
Searching for binary rubies, this might take some time.
No binary rubies available for: osx/10.7/x86_64/ruby-2.0.0-p195.
Continuing with compilation. Please read 'rvm mount' to get more information on binary rubies.
Installing requirements for osx, might require sudo password.
-bash: /usr/local/Cellar/openssl/1.0.1e/bin/openssl: No such file or directory
Updating certificates in ''.
mkdir: : No such file or directory
Password:
mkdir: : No such file or directory
Can not create directory '' for certificates.
Not good!!! What's all that about? Turns out to be just a very clumsy way of telling me I don't have OpenSSL 1.0.1e installed.

I already have OpenSSL 1.0.1c installed using brew (so it doesn't mess with the MacOSX system-installed OpenSSL), so updating is simply:
$ brew upgrade openssl
==> Summary
/usr/local/Cellar/openssl/1.0.1e: 429 files, 15M, built in 5.0 minutes
So then I can try the Ruby 2 install again, starting with the "rvm requirements" command to first make sure all pre-requisites are installed:
$ rvm requirements
Installing requirements for osx, might require sudo password.
[...]
Tapped 41 formula
Installing required packages: apple-gcc42.................
Updating certificates in '/usr/local/etc/openssl/cert.pem'.
$ rvm install ruby-2.0.0
Searching for binary rubies, this might take some time.
No binary rubies available for: osx/10.7/x86_64/ruby-2.0.0-p195.
Continuing with compilation. Please read 'rvm mount' to get more information on binary rubies.
Installing requirements for osx, might require sudo password.
Certificates in '/usr/local/etc/openssl/cert.pem' already are up to date.
Installing Ruby from source to: /Users/paulgallagher/.rvm/rubies/ruby-2.0.0-p195, this may take a while depending on your cpu(s)
[...]
$
OK, this time it installed cleanly as I can quickly verify:
$ rvm use ruby-2.0.0
$ ruby -v
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin11.4.2]
$ irb -r openssl
2.0.0p195 :001 > OpenSSL::VERSION
=> "1.1.0"
2.0.0p195 :002 > OpenSSL::OPENSSL_VERSION
=> "OpenSSL 1.0.1e 11 Feb 2013"

read more and comment..

Optimising presence in Rails with PostgreSQL

(blogarhythm ~ Can't Happen Here - Rainbow)
It is a pretty common pattern to branch depending on whether a query returns any data - for example to render a quite different view. In Rails we might do something like this:

query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.presence
results.each {|row| ... }
else
# do something else
end
When this code executes, we raise at least 2 database requests: one to check presence, and another to retrieve the data. Running this at the Rails console, we can see the queries logged as they execute, for example:
(0.9ms)  SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL
User Load (15.2ms) SELECT "users".* FROM "users" WHERE "users"."deleted_at" IS NULL
This is not surprising since under the covers, presence (or present?) end up calling count which must do the database query (unless you have already accessed/loaded the results set). And 0.9ms doesn't seem too high a price to pay to determine if you should even try to load the data, does it?

But when we are running on PostgreSQL in particular, we've learned to be leery of COUNT(*) due to it's well known performance problems. In fact I first started digging into this question when I started seeing expensive COUNT(*) queries show up in NewRelic slow transaction traces. How expensive COUNT(*) actually is depends on many factors including the complexity of the query, availability of indexes, size of the table, and size of the results set.

So can we improve things by avoiding the COUNT(*) query? Assuming we are going to use all the results anyway, and we haven't injected any calculated columns in the query, we could simply to_a the query before testing presence i.e.:
query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.to_a.presence
results.each {|row| ... }
else
# do something else
end

I ran some benchmarks comparing the two approaches with different kinds of queries on a pretty well-tuned system and here are some of the results:
QueryUsing present?Using to_aFaster By
10k indexed queries returning 1 / 1716 rows17.511s10.938s38%
4k complex un-indexed queries returning 12 / 1716 rows23.603s15.221s36%
4k indexed queries returning 1 / 1763218 rows22.943s20.924s9%
10 complex un-indexed queries returning 15 / 1763218 rows23.196s14.072s40%

Clearly, depending on the type of query we can gain up to 40% performance improvement by restructuring our code a little. While my aggregate results were fairly consistent over many runs, the performance of individual queries did vary quite widely.

I should note that the numbers were *not* consistent or proportional across development, staging, test and production environments (mainly due to differences in data volumes, latent activity and hardware) - so you can't benchmark on development and assume the same applies in production.

Things get murky with ActiveRecord add-ons

So far we've talked about the standard ActiveRecord situation. But there are various gems we might also be using to add features like pagination and search magic. MetaSearch is an example: a pretty awesome gem for building complex and flexible search features. But (at least with version 1.1.3) present? has a little surprise in store for you:
irb> User.where(id: '0').class
=> ActiveRecord::Relation
irb> User.where(id: 0).present?
(0.8ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" = 0
=> false
irb> User.search(id_eq: 0).class
=> MetaSearch::Searches::User
irb> User.search(id_eq: 0).present?
=> true

Any Guidelines?

So, always to_a my query results? Well, no, it's not that simple. Here are some things to consider:
  • First, don't assume that <my_scoped_query>.present? means what you think it might mean - test or play it safe
  • If you are going to need all result rows anyway, consider calling to_a or similar before testing presence
  • Avoid this kind of optimisation except at the point of use. One of the beauties of ActiveRecord::Relation is the chainability - something we'll kill as soon as we hydrate to a result set Array for example.
  • While I got a nice 40% performance bonus in some cases with a minor code fiddle, mileage varies and much depends on the actual query. You probably want to benchmark in the actual environment that matters and not make any assumptions.

read more and comment..

My Virtual Swag from #rdrc

(blogarhythm ~ Everybody's Everything - Santana)

So the best swag you can get from a technology conference is code, right? Well RedDotRubyConf 2013 did not disappoint! Thanks to some fantastic speakers, my weekends for months to come are spoken for. Here's just some of the goodness:


read more and comment..