my recent reads..

Ruby Tuesday

(blogarhythm ~ Ruby - Kaiser Chiefs)
@a_matsuda convinced us to dive into Ruby 2.0 at RedDotRubyConf, so I guess this must be the perfect day of the week for it!

Ruby 2.0.0 is currently at p195, and we heard at the conference how stable and compatible it is.

One change we learned that may catch us if we do much multilingual work that's not already unicode is the change that Ruby now assumes UTF-8 encoding for source files. So the special "encoding: utf-8" marker becomes redundant, but if we don't include it the behaviour in 2.0.0 can differ from earlier versions:

$ cat encoding_binary.rb 
s = "\xE3\x81\x82"
p str: s, size: s.size
$ ruby -v encoding_binary.rb
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin11.4.2]
{:str=>"あ", :size=>1}
$ ruby -v encoding_binary.rb
ruby 1.9.3p429 (2013-05-15 revision 40747) [x86_64-darwin11.4.2]
{:str=>"\xE3\x81\x82", :size=>3}

Quickstart on MacOSX with RVM

I use rvm to help manage various Ruby installs on my Mac, and trying out new releases is exactly the time you want it's assistance to prevent screwing up your machine. There were only two main things I needed to take care of to get Ruby 2 installed and running smoothly:
  1. Update rvm so it knows about the latest Ruby releases
  2. Update my OpenSSL installation (it seems 1.0.1e is required although I haven't found that specifically documented anywhere)
Here's a rundown of the procedure I used in case it helps (note, I am running MacOSX 10.7.5 with Xcode 4.6.2). First I updated rvm and attempted to install 2.0.0:
$ rvm get stable
# => updated ok
$ rvm install ruby-2.0.0
Searching for binary rubies, this might take some time.
No binary rubies available for: osx/10.7/x86_64/ruby-2.0.0-p195.
Continuing with compilation. Please read 'rvm mount' to get more information on binary rubies.
Installing requirements for osx, might require sudo password.
-bash: /usr/local/Cellar/openssl/1.0.1e/bin/openssl: No such file or directory
Updating certificates in ''.
mkdir: : No such file or directory
mkdir: : No such file or directory
Can not create directory '' for certificates.
Not good!!! What's all that about? Turns out to be just a very clumsy way of telling me I don't have OpenSSL 1.0.1e installed.

I already have OpenSSL 1.0.1c installed using brew (so it doesn't mess with the MacOSX system-installed OpenSSL), so updating is simply:
$ brew upgrade openssl
==> Summary
/usr/local/Cellar/openssl/1.0.1e: 429 files, 15M, built in 5.0 minutes
So then I can try the Ruby 2 install again, starting with the "rvm requirements" command to first make sure all pre-requisites are installed:
$ rvm requirements
Installing requirements for osx, might require sudo password.
Tapped 41 formula
Installing required packages: apple-gcc42.................
Updating certificates in '/usr/local/etc/openssl/cert.pem'.
$ rvm install ruby-2.0.0
Searching for binary rubies, this might take some time.
No binary rubies available for: osx/10.7/x86_64/ruby-2.0.0-p195.
Continuing with compilation. Please read 'rvm mount' to get more information on binary rubies.
Installing requirements for osx, might require sudo password.
Certificates in '/usr/local/etc/openssl/cert.pem' already are up to date.
Installing Ruby from source to: /Users/paulgallagher/.rvm/rubies/ruby-2.0.0-p195, this may take a while depending on your cpu(s)
OK, this time it installed cleanly as I can quickly verify:
$ rvm use ruby-2.0.0
$ ruby -v
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin11.4.2]
$ irb -r openssl
2.0.0p195 :001 > OpenSSL::VERSION
=> "1.1.0"
2.0.0p195 :002 > OpenSSL::OPENSSL_VERSION
=> "OpenSSL 1.0.1e 11 Feb 2013"

read more and comment..

Optimising presence in Rails with PostgreSQL

(blogarhythm ~ Can't Happen Here - Rainbow)
It is a pretty common pattern to branch depending on whether a query returns any data - for example to render a quite different view. In Rails we might do something like this:

query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.presence
results.each {|row| ... }
# do something else
When this code executes, we raise at least 2 database requests: one to check presence, and another to retrieve the data. Running this at the Rails console, we can see the queries logged as they execute, for example:
(0.9ms)  SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL
User Load (15.2ms) SELECT "users".* FROM "users" WHERE "users"."deleted_at" IS NULL
This is not surprising since under the covers, presence (or present?) end up calling count which must do the database query (unless you have already accessed/loaded the results set). And 0.9ms doesn't seem too high a price to pay to determine if you should even try to load the data, does it?

But when we are running on PostgreSQL in particular, we've learned to be leery of COUNT(*) due to it's well known performance problems. In fact I first started digging into this question when I started seeing expensive COUNT(*) queries show up in NewRelic slow transaction traces. How expensive COUNT(*) actually is depends on many factors including the complexity of the query, availability of indexes, size of the table, and size of the results set.

So can we improve things by avoiding the COUNT(*) query? Assuming we are going to use all the results anyway, and we haven't injected any calculated columns in the query, we could simply to_a the query before testing presence i.e.:
query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.to_a.presence
results.each {|row| ... }
# do something else

I ran some benchmarks comparing the two approaches with different kinds of queries on a pretty well-tuned system and here are some of the results:
QueryUsing present?Using to_aFaster By
10k indexed queries returning 1 / 1716 rows17.511s10.938s38%
4k complex un-indexed queries returning 12 / 1716 rows23.603s15.221s36%
4k indexed queries returning 1 / 1763218 rows22.943s20.924s9%
10 complex un-indexed queries returning 15 / 1763218 rows23.196s14.072s40%

Clearly, depending on the type of query we can gain up to 40% performance improvement by restructuring our code a little. While my aggregate results were fairly consistent over many runs, the performance of individual queries did vary quite widely.

I should note that the numbers were *not* consistent or proportional across development, staging, test and production environments (mainly due to differences in data volumes, latent activity and hardware) - so you can't benchmark on development and assume the same applies in production.

Things get murky with ActiveRecord add-ons

So far we've talked about the standard ActiveRecord situation. But there are various gems we might also be using to add features like pagination and search magic. MetaSearch is an example: a pretty awesome gem for building complex and flexible search features. But (at least with version 1.1.3) present? has a little surprise in store for you:
irb> User.where(id: '0').class
=> ActiveRecord::Relation
irb> User.where(id: 0).present?
(0.8ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" = 0
=> false
irb> 0).class
=> MetaSearch::Searches::User
irb> 0).present?
=> true

Any Guidelines?

So, always to_a my query results? Well, no, it's not that simple. Here are some things to consider:
  • First, don't assume that <my_scoped_query>.present? means what you think it might mean - test or play it safe
  • If you are going to need all result rows anyway, consider calling to_a or similar before testing presence
  • Avoid this kind of optimisation except at the point of use. One of the beauties of ActiveRecord::Relation is the chainability - something we'll kill as soon as we hydrate to a result set Array for example.
  • While I got a nice 40% performance bonus in some cases with a minor code fiddle, mileage varies and much depends on the actual query. You probably want to benchmark in the actual environment that matters and not make any assumptions.

read more and comment..

My Virtual Swag from #rdrc

(blogarhythm ~ Everybody's Everything - Santana)

So the best swag you can get from a technology conference is code, right? Well RedDotRubyConf 2013 did not disappoint! Thanks to some fantastic speakers, my weekends for months to come are spoken for. Here's just some of the goodness:

read more and comment..

Will I still be a Rubyist in 5 years? #rdrc

(blogarhythm ~ Ruby - Kaiser Chiefs)

The third RedDotRubyConf is over, and I think it just keeps getting better! Met lots of great people, and saw so many of my Ruby heroes speak on stage. Only thing that could make it even better next year would be to get the video recording thing happening!

I had the humbling opportunity to share the stage and here are my slides. Turned out to be a reflection on whether I'd still be a Rubyist in another 5 years, and what are the external trends that might change that. Short story: Yes! Of course. I'll always think like a Rubyist even though things will probably get more polyglot. The arena of web development is perhaps the most unpredictable though.

A couple of areas I highlight that really need a bit more love include:

  • There's a push on SciRuby. Analytics are no longer the esoteric domain of bioinformaticists. Coupled with Big Data (which Ruby is pretty good at), analytics are driving much of the significant innovation in things we build.
  • Krypt - an effort lead by Martin Boßlet to improve the cryptographic support in Ruby. My experience building megar made it painfully obvious why we need to fix this.

Let it never be said, the romance is dead
'Cos there’s so little else occupying my head

I mentioned a few of my projects in passing. Here are the links for convenience:
  • RGovData is a ruby library for really simple access to government data. It aims to make consuming government data sets a "one liner", letting you focus on what you are trying to achieve with the data, and happily ignore all the messy underlying details of transport protocols, authentication and so on.
  • sps_bill_scanner is a ruby gem for converting SP Services PDF invoices into data that can be analysed with R. Only useful if you are an SP Services subscriber in Singapore, but other wise perhaps an interesting example of extracting postitional text from PDF and doing some R.
  • megar ("megaargh!" in pirate-speak) is a Ruby wrapper and command-line (CLI) client for the API. My example of how you *can* do funky crypto in Ruby ... it's just much harder than it should be!

read more and comment..